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PREFACE 


This book, with apologies for the pretentious title, represents the text of a course 
we have been teaching at Harvard for the past eight years. The course is aimed 
at students with an interest in physics who have a good grounding in one- 
variable calculus. Some prior acquaintance with linear algebra is helpful but not 
necessary. Most of the students simultaneously take an intensive course in physics 
and so are able to integrate the material learned here with their physics education. 
This also is helpful but not necessary. The main topics of the course are the theory 



an d ph ysical a pplication of linear alg ebra, and of the calculus of seve ral variables, 

calculus. Our pedagogical 

sophisticati on and r ange o f application, rather than t he ‘rectilinear ap proach’ of 
strict logical order. There are, we hope, no vicious circles of logical error, but we 
will frequ e ntly develop a special case of a subject, and then r etu r n to it fo r a mo r e 



general definition and se tting onl y after a broad er persp ective can b e achieved 
through the intro duct ion of related t op ics . Thi s m akes some demands of p atie nce 


and faith on the part of the student. But we hope that, at the end, the student is 


rewarded by a deeper intuitive understanding of the subject as a whole. 


Here is an outline of the contents of the book in some detail. The goal of the 
first four chapters is to develop a familiarity with the algebra and analysis of 
square matrices. Thus, by the end of these chapters, the student should be thinking 
of a matrix as an object in its own right, and not as a square array of numbers. 
We deal in these chapters almost exclusively with 2x2 matrices, where the most 
complicated of the computations can be reduced to solving quadratic equations. 
But we always formulate the results with the higher-dimensional case in mind. We 
begin Chapter 1 by explaining the relation between the multiplication law of 2 x 2 
matrices and the geometry of straight lines in the plane. We develop the algebra 
°f 2 x 2 matrices and discuss the determinant and its relation to area and 
orientation. We define the notion of an abstract vector space, in general, and 
explain the concepts of basis and chang e of basis for one- and two-dimensional 
vector spaces. 




In Chapter 2 we discuss conformal linear geometry in the plane, that is, the 
metrv of lines and angles, and its relation to certain kinds 
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quantum mecnanics. we use inese notions 10 give an aigorunm ior compu 
the powers of a matrix. As an application we study the basic properties of Markov 
chains. - 

The pri ncipal goal of Chapter 3 i s to e xpla in th at a system of hom ogeneous 
linear differentia] equations with constant coefficients can be written as du/d t = Au 
where A is a matrix and u is a vector, and that the solution can be written as 
e^ r u 0 where u 0 gives the initial conditions. This of course requires us to explain 
what is meant by the exponential of a matrix. We also describe the qualitative 
behavior of solutions and the inhomogeneous case, including a discussion of 
resonance. 

Chapter 4 is devoted to the study of scalar products and quadratic forms. It is 
rich in physical applications, including a discussion of normal modes and a detailed 
treatment of special relativity. 

Chapters 5 and 6 present the basic facts of the differential calculus. In Chapter 5 
we define the differential of a map from one vector space to another, and discuss 


directi ona l and partiarderi vativ es, and linear di fferenti al forms. 

In Cha pter 6 we contin ue the stu dy of the diffe ren t ial ca lculus . We present the 

function theorem. We discuss critical point behavior and Lagrange multipliers. 
Chapters 7 and 8 are meant as a first introduction to the integral calculus. 
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one-dimensional integrals such as arc length are also discussed. 


' K-Ssl i St' t ft Stl ii iWSfi sfe< ii s Ith 


under pullback is stressed. The two-dimensional version of Stokes’ theorem, i.e. 
Green’s theorem, is proved. Surface integrals in three-space are studied. 

Chapter 9 presents an example of how the results of the first eight chapters can 
be applied to a physical theory - optics. It is all in the nature of applications, and 
can be omitted without any effect on the understanding of what follows. 

In Chapter 10 we go back and prove the basic facts about finite-dimensional 
vector spaces and their linear transformations. The treatment here is a straight¬ 
forward generalization, in the main, of the results obtained in the first four chapters 
in the two-dimensional case. The one new algorithm is that of row reduction. Two 
important new concepts (somewhat hard to get used to at first) are introduced: 
th ose of the dual space and the quotient space. These concepts will prove crucial 
in what follows. _ 
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matrices. The subject is developed axiomatica ll y, and the basic computational 
algorithms arc pr e s e nt e d. 

Chapters 12-14 are meant as a gentle introduction to the mathematics of shape, 
t hat is, algebraic top olo gy. In Chapter 12 we beg in the study of electrical networks. 
This involves two aspects. One is the study of the ‘wiring’ of the network, that is, 
how the various branches are interconnected. In mathematical language this is 
known as the topology of one-dimensional complexes. The other is the study of 
how the network as a whole responds when we know the behavior of the individual 
branches, in particular, power and energy response. We give some applications to 
physically interesting networks. 

In Chapter 13 we continue the study of electrical networks. We examine the 
boundary-value problems associated with capacitive networks and use these 
methods to solve some classical problems in electrostatics involving conductors. 

In Chapter 14 we give a sketch of how the one-dimensional results of Chapters 12 
and 13 generalize to higher dimensions. 


Chapters 15-18 develop the exterior differential calculus as a continuous version 
of the discrete theory of complexes. In Chapter 15 the basic facts of the exterior 
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Stokes theorem. 


of the vacuum give the continuous analog of the capacitance of a network, and 



dimensional space. 1 he basic lacts ol potential theory are presented. 

Chapter 17 continues the studv of the exterior differential calculus. The main 



applied to magnetostatics. 

Chapter 18 concludes the studv of the exterior calculus with an in-depth 

discussion of the star operator in a general context. 


Chapter 19 can be thought of as the culmination of the course. It applies the 
results of the preceding chapters to the study of Maxwell’s equations and the 
associated wave equations. 

Chapters 20 and 21 are essentially independent of Chapters 9-19 and can be 
read independently of them. They are not usually included in our one-year course. 
But Chapters 1-9, 20 and 21 would form a self-contained unit for a shorter course. 

The material in Chapter 20 is a relatively standard treatment of the theory of 
functions of a complex variable, suitable for students at the level of this book. 

Chapter 21 discusses some of the more elementary aspects of asymptotics. 

Chapter 22 shows how the exterior calculus can be used in classical thermo¬ 
dynamics, following the ideas of Born and Caratheodory. 

The book is divided into two volumes, with Chapters 1-11 in volume T. 

Most of the mathematics and all of the p hysics presented in this bo ok were 
developed by th e first d e cad e of the twentieth century. The material is thus at 
least seventy-five years old. Yet much of the material is not yet standard in the 










quaternions, a theory which had a good deal of popularity in England in the 
middle of the nineteenth century. Quaternions had several drawbacks: they more 
naturally pertained to four, rather than to three dimensions - the geometry of 
three dimensions appeared as a piece of a larger theory rather than having a 
natural existence of its own; also, they have too much algebraic structure, the 
relation between quaternion multiplication, for example, and geometric construc¬ 
tions in three dimensions being somewhat complicated. (The first of these objections 
would, of course be regarded far less seriously today. But it would be replaced by 
an objection to a theory that is limited to four dimensions.) Eventually, the three- 
dimensional vector algebra with its scalar and vector products was distilled from 
the theory of quaternions. It was conjoined with the necessary differential 
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So vector analysis, with its grad, div, curl etc. became the standard language in 
which the geometric laws of physics were taught. Now whil e v ector analysis is 
w e ll suit e d to the g e om e try of thr ee -dim e nsional Euclid e an space, it has a numb e r 
of serious drawbacks. First, and least serious, is that the essential unity of the 
subject is obscured. Thus the fundamental theorem of the calculus, Green’s theorem, 
Gauss’ theorem and Stokes’ theorem are all aspects of the same theorem (now 


called StoKes tneoremi. Hut tms is not at an clear in me vector analysis treatment. 


e vector analysis treatmen 
More serious is that the fundamental operators involve the Euclidean structure 
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eniate 


e ineory is we 

Euclidean space. A related problem is that the operators do not behave nicely 
under general changes of coordinates - their expression in non-rectangular co¬ 
ordinates being unwieldy. Already Poincare, in his fundamental scientific and 
philosophical writings which led to the theory of relativity, stressed the need to 
distinguish between those laws of geometry and physics which are ‘topological’, 
i.e. depend only on the differential structure of space and so are invariant under 
smooth deformations, and those which depend on more geometrical structure such 
as the notion of distance. One of the major impacts of the theory of relativity on 
mathematics was to encourage the study of higher-dimensional spaces, a study 
which had existed in the previous mathematical literature, but was not regarded 
as central to the study of geometry. Another was to emphasize general coordinate 
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Preface xv 

jumble of indices has a number of serious drawbacks, the most serious of which 
being that it is extraordinarily difficult to tell which operations have any geometric 
significance and which are artifacts of the coordinate system. Thus, while i t is 
re asona bly w ell -suited for computation, it is hard to assess exactly what it is that 
one is computing. The whol e purpose of the development initiat e d by Hamilton - to 
have a calculus whose objects have a perceived geometrical significance - was 
vitiated. In order to make the theory work one had to introduce a relatively 
sophisticated geometrical construct, such as an affine connection. Even with such 
constructs the geometric meanings of the operations are obscure. In fact tensor 
analysis never displaced the intuitively clear vector analysis from the elementary 
curriculum. 

It is generally accepted in the mathematics community, and gradually being 
accepted in the physics community, that the most suitable framework for geo¬ 
metrical analysis is the exterior differential calculus of Grassmann and Cartan. This 
calculus has the advantage that its computational rules are simple and concise, 
that its objects have a transparent geometrical significance, that it works in all 
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ehaves well under maps and changes of coordinates, that it 



etween 




take on a simple and elegant form in terms of the exterior calculus. To emphasize 
this point, it might be useful to reproduce the above table, taken from Thirring’s 
Course on Mathematical Physics. 

Hermann Grassmann (1809-77) published his Ausdehnungslehre in 1844. It was 
the mathematical community and was dismissed bv the leadin 
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university position in matnematics. we remained a mgn-scnooi teacner tnrougnout 
hi s career. (Neverth eless, he seemed to have a happy and product ive life. He raised a 
large family and was recognized as an expert on Sanskrit literature.) Towards the 
end of his life he tried again, with another edition of his Ausdehnungslehre , but this 
fared no better than the first. Only one or two mathematicians of his time, such as 
Mobius, appreciated his work. Nevertheless, the Ausdehnungslehre (or calculus of 
extension) contains for the first time many of the notions central to modern 
mathematics and most of the al gebraic structu res used in t his bo o k. Thus v ector 


Elie Cartan (1869-1951) is now universally recognized as the leading geometer 
of our century. His early work, of such overwhelming importance for modem 
mathematics, on Lie groups and on systems of partial differential equations was 
done in relative obscurity. But, by the 1920s, his work became known to the broad 
mathematical community, due, in part, to the writings of Hermann Weyl who 
presented novel expositi ons o f his work at a ti me when the theory of Lie groups 


began to play a centrarrole in mathematics and in physics. Cartan s work on th( 
theory of principal bundles and connections is now basic to the theory of elementar 
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efons sur tes invariants integraux m 
the exterior differential calculus, which he had invented, was a flexible tool, not 
only for geometry but also for the variational calculus and a wide variety of 
physical applications. It has taken a while, but, as we have mentioned above, it 
is now recognized by mathematicians and physicists that this calculus is the 
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curriculum and have proceeded accordingly. 

Some explanation is in order for the time and effort devoted to the theory of 


electrical networks, a subject not usually considered as part of the elementary 
curriculum. First of all there is a purely pedagogical justification. The subject 
always goes over well with the students. It provides a down-to-earth illustration 
of such concepts as dual space and quotient space, concepts which frequently seem 
overly abstract and not readily accepted by the student. Also, in the discrete, 
algebraic setting of network theory, Stokes’ theorem appears as essentially a 
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Stokes’ theorem in the setting of the exterior calculus. There are deeper, more 
nhilosoohical reasons for our decision to emphasize network theory. It hat been 


together are essentially electrical in character. I hus (in the approximation where 
the notion of rigid body and Euclidean geometry makes sense, that is, in the 
non-relativistic realm) the concept of a rigid body, and hence of Euclidean geometry, 
derives from electrostatics. The frontiers of physics, both in the very small (the 
study of elementary particles) and the very large (the study of cosmology) have 
already begun to reopen fundamental questions as to the geometry of space and 
time. We thought it wise to bring some of the issues relating geometry to physics 
before the student even at this early stage of the curriculum. The advent of the 
computer, and also some of the recent theories of physics will, no doubt, call into 
question the discrete versus the continuous character of space and time (an issue 
raised by Riemann in his dissertation on the foundations of geometry). It is to be 
hoped that our discussion may be of some use to those who will have to deal with 
this probl e m in th e futur e . 

Of course, we have had to omit several important topics due to the limitation 
of a one-year course. We do not discuss i nfin it e-dimensional vector sp a ces, i n 
particular Hilbert spaces, nor do we define or study abstract differentiable manifolds 
“arid their properties. It has been our experience that these topics make too heavy 
a demand on the sophistication of the student, and the effort involved in explaining 
them is best expended elsewhere. Of course, at various places in the text we have 
to pay the price for not having these concepts at our disposal. More serious is the 
omission of a serious discussion of Fourier analysis, classical mechanics and 


require a semester’s course, and substantive treatments from the modern viewpoint 
are available elsewhere. A suggested guide to further reading is given at the end 
of the book. 

We would like to thank Prof. Daniel Goroff for a careful reading of the 
manuscript and for making many corrections and fruitful suggestions for improve¬ 
ment. We would also like to thank Jeane Morris for her excellent typing and 
her devoted handling of the production of the manuscript from the inception of 
the project to its final form, over a period of eight years. 
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In Chapter 1 we explain the relation between the multi¬ 
plication law of 2 x 2 matrices and the geometry of straight 
lines in the plane. We develop the algebra of 2 x 2 matrices 
and discuss the determinant and its relation to area and 
orientation. We define the notion of an abstract vector space, 
in general, and explain the concepts of basis and change of 
basis for one- and two-dimensional vector spaces. 


1.1. Affine planes and vector spaces 

The familiar Euclidean plane of high-school plane geometry arose early in the 
history of mathematics because its properties are readily discovered by physical 
experiments with a tabletop or blackboard. Through our experience in using rulers 
and protractors, we are inclined to accept ‘length’ and ‘angle’ as concepts which 
are as fundamental as ‘point’ and ‘line’. We frequently have occasion, though, both 
in pure mathematics and in its applications to physics and other disciplines, to 
consider planes for which straight lines are defined but in which no general notion 
of length is defined, or in which the usual Euclidean notion of length is not 
appropriate. Such a plane may be represented on a sheet of paper, but the physical 
distance between two points on the paper, as measured by a ruler, or the angle 
between two lines, as measured by a protractor, need have no significance. 

An example of such a plane is the one used to describe graphically the motion 
of particles along a line (the x-axis). A point P or Q in this plane represents the 
physical concept of event , something which has a time and place. A line / also 
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Figure 1.2 








The lines of the affine plane AR Z can be described in various ways. One way is 

to give an equation satisfied by the points of the fine, for example 


? _ „ 

X 


l = < 


tlx -f- by — c V. 



/ 

fl C1 . 1 r • 

X 


This is to be read as l is the set of points 

y 

such that the equalioinzx + by = c 


is satisfied’. Here it is assumed that a and b are not both zero. 

This method of characterizing a line is a little inconvenient because the para¬ 
meters a,b,c which characterize the line are not unique. For example 


x 

y 


ax -y by — c 


and 


3 ax + 3 by = 3c 


are the same line. More generally the parameters ra, rb, rc, for r 0, describe the 
same straight line as a , b, c. 


on the line. Given two distinct point s P 0 = 


L-foJ 


and P x = 


Uhr 




the line through P x and P 2 as the set of all points 


*o + - x 0 ) 


bd Lfo + ^i -^o)_ 


)ver the r eal num bers. This description of a li ne is 
even more redundant than the previous one: we can replace our points P x and 


P 0 by any othe r pair of distinct points on the same l i ne. 


way of describing a straight line (a more ‘dyn a m i c’ as 
opposed to a ‘static’ way) is to give a point on the line and the ‘direction vector 
of the line’: thus the set of all points of the form 


x 0 

ko 


+ t I 


u 

V 


teM > where 


u\ ( 0\ . s 

# is a fixed vector 

v \ 0 / 


is a line. ( Here we think of the line as being traversed by a particle moving with 


‘velocity vector’ ( ) and situated at 

v 


Xr 


y o. 


at time zero. | Here we have used 


four parameters to describe the line. But we can multiply ^ ^ by any non-zero 
scalar and get the same line (just traversed with different velocity) and we can 


j. , 

x 0 


displace 


along the line, showing that we have two redundant parameters. 


L^o J 












course, this ties in with our second description if 


y = y i -3V 


the awkward feature that it does not describe absolutely all lines in the same way. 


y = ax+ b 


is a straight line which intersects the y-axis at the point ^ J and which has ‘slope’ 

a; i.e., for points on the line, an increase in one unit of x implies an increase in a 
units of y. This set is a line, and the description is not redundant, for we have 
described a and b in terms of geometric properties of the line. But not all lines 
are of this form. We must add the lines which are parallel to the y-axis, and which 
have the description 

___H|;];= d \ _ 

Fr om a strictly logical point of view , we sho uld take one of the four descriptions 
given above as our definition of a straight line; for example , we should say that, 
by definition, a line is a subset, /, of AIR 2 such that there are three real numbers 
a, b, and c with a and b not both zero such that 


We should then prove that such a subset can be given by either of the other three 
descriptions. We shall not 20 into such logical nicef 


it is important to remember that an aliine plane Has no origin and tftat it makes 
no sense to add points of an affine plane. We attach no special significance to the 

-01 , . . ^ m , T 3 ' 

point , and we resist the temptation to add points like and 

LOJ L 1 J 6 

‘coordinate by coordinate’. There is, however, a closely related mathematical 

structure, called a two-dimensional vector space, in which an operation of addition 

is defined. We construct a vector space from an affine plane by associating with 

any pair of points the ‘displacement vector’ PQ whose ‘tail’ is at P and whose 
‘head’ is at Q. We denote vectors by lowercase bold letters: v, w, etc. A vector v is 

also given as a pair of real numbers, for example v = ^2^)' ^°^ ce that we use 



Q 


Figure 1.3 

of as that displacement which carries the point * into ^ , carries the point 

~ — 3~| . m , . , . r xl . ^ fx + 5 - 

into , and, in general, carries any point P = into Q = 

2 a L 4 J uj ly + 2 _ 


into Q = 


x + 5 


L 2J L4j " L>J b+2J' 

Thus each vector v determines a (particular kind of) transformation of the affine 
plane into itself, a rigid translation of the whole plane. If P is any point in the 
plane, we will denote the displaced point Q by P“ + ”v: the “ + ” is a symbol for 
this operation of vectors on points. Thus v sends P into Q = P“ + ”v. Explicitly, 


if P = 


and v = I ^ j, then P“ + ”v = 


x + a 
_y + b_ ' 


ints and vectors, and so differs from the usual 
m any pa ir of points P and Q, there is a u ni que 


noti on of addition. Similarly, give n any p air of points P and Q, there is a unique 
vec tor v — Q “ — ”P such that 

P“ + ”v = 0. 

We put quotation marks around the — because if relates different kinds of objects, 
it gives a vector from a nair of points. You should convince yourself, bv working 


the same vector, i.e., Q“ — ”P = S“ — ”P, if and only if PQ and RS are opposite 


iwift rrrra ajm rcw< 7«m ttip i gig gg 




enne tne sum oi two vectors: n u 


, aeiine tne 


sum by 


u 4- v = 


P“ + ”(u + v) = (P“ + ”u)“ + ”v 


(a\ C 

since, if u = I I, v = 


and P 


5 

LyJ 


then both the left and the right hand 


side of the above equation equal 


-f c -f x 


. The equation (1.1) says 


-- - n - n lb + d + y J 

that the displacement corresponding to u + v can be obtained by successively 
applying the displacement v and then the displacement u. Notice that u + v = v + u. 
We can visualize the addition of vectors by the familiar parallelogram law: if we 
start with a point P and write R = P “ + ” u, Q = P “ + ”v and S = P “ + ” (u + v), then 
the four points P, Q, S, R lie at the four vertices of a parallelogram. You should 
convince yourself of this fact by working out some examples on graph paper. The 


proo f of this f act goes as follows. For any vector v = ( J and any real number 


l - tv - H u)- lf ' v (») ,lfld 


>int, the set 


l = {P “ + ”tv} (as t varies over IR) 

ig through P (just look at the third of our four descriptions 


is a straight line passing through P (just look at the thir 
of straight lines! If R is some other point, then the line 


'+ sv\ (as s varies over 


and t 1 such that 


which means that 


P“ + »5 1 V = P“ + ”t 1 V 


R = P“ + ”(t 1 -s 1 )y 

and hence, for every s, that 

R“ + ” sy = p“ + ”(S + ti - S^V. 

This means that the lines m and l coincide. In other words, either the lines / and 
w coincide, or they do not intersect, i.e., either they are the same or they are 

parallel. Now let us go back to our diagram for vector addition. If v ^ then 

__• j. y~\ rt <i . i* _ _ii. . i;_ 7 _D tl-w=> rv^int V — J? “ J 1."_ 


on the line w through R. There are now two poss 
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lie on the line /, so that u # tv for any t, the lines l and m are parallel. A similar 
argument applies to the other two sid e s and we conclud e that the figure is a 
parallelogram. If u = tv, then all four points lie on the line l. We can still view thisT 
nicture as a sort of ‘degenerate’ parallelogram: 


Figure 1.5 


furthe 


Figure 1.6 

We say that the vectors u and v are linearly dependent if there are numbers r and s, not 


If r # 0 we can solve this equation for u to obtain u = — (s/r)\ and if s ^ 0 we can 
solve this equation for v = — (r/s)u. In either case, the ‘addition parallelogram’ 


degenerates into segments on a line ^or if u = v = ^ * nto a single point j. This 

is the reason for the term linearly dependent. If two vectors are not linearly 
dependent, we say that they are linearly independent. 


The zero vector 


denoted by 0, has the same point for its head and tail. 


It is called an additive identity because 


+ v = v + 0 = v for all v. 


The set of all vectors v = ( ) where x and y are arbitrary real numbers is 


called IR 2 . The space IR 2 is an example of a vector space, to be defined in the 
next section. The notational distinction between U 2 and AIR 2 lies in the fact that 



1.2. Vector spaces and their affine spaces 

It is easy to check that the operations of addition of vectors in [R 2 and for multiplying 
vectors by real numbers satisfy the following collection of axioms: 


< Anear transformations oj the plane 


T.aws fnr addition nf vectors 




associative law oi aGuiiion. 

u ' V/ W — U (V + wj. 


Commutative law of addition: 

u + v = v 4- u. 
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for all v. 

r V ▼ 


Existence of additive inverse: for every v there is a — v such that 

v + (— v) = 0. 


Laws involving the multiplication of vectors by real numbers 

‘One’ acts as multiplicative 

identity: lv = v for every v. 

Associative and distributive laws: for any real numbers r and s and any 

vectors u and v 
(rs)\ = r(s\) 

(r + s)v = rv + sv 
r( u 4- v) = ru + rv. 


By definition, a 

vector space i s a coll e c tion, V, of objects, u, v, e tc., ca ll e d vect ors, such that we are 


given a binary operation, +, which assigns to every pair of vectors u and v a third 


vector u 4- v and a multiplication which assig n s to every real number t and every 
-vector-v-another-veetortvsuehthat-theaboveaxioms-hold— 

We have verified that R 2 is an example of a vector space. As a second example, 
we could tak e R 3 where a vecto r now consists of a triplet 


5 


v= b 


Z 2 


of real numbers. Addition of vectors is done componentwise as in R 2 : 

/a, \ f a 2 \ /a 1 +a 2 \ 

if Vi =| bj ] and v 2 =( b 2 , then Vi + v 2 =| b 1 +b 2 1 . 

\cj \c 2 J \ Cl +c 2 J 

The space R 3 is just the space of vectors in our familiar three-dimensional space. 
We shall study the concept of dimension later on. We could also consider the 
space R = R 1 of the real numbers themselves as a vector space. Here addition is just 
the ordinary addition and multiplication ordinary multiplication. When we 
introduce the notion of dimension, this will be an example of a one-dimensional 
vector space. 

As a different looking example of a vector space, consider the collection of all 
polynomials. We can add two polynomials: 

(1 + 3x + lx 2 ) + (2 - x 2 + x 4 - x 6 ) = 3 + 3x + 6% 2 + x 4 - x 6 , 
just add the coefficients We can also multiply a polynomial by a real number: 

7(1 + 3x + 3x 2 ) = 7 + 21x + 21.x 2 . 
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consider llie space ol polynomials ol at most a given degree, 
most general polynomial of degree at most two is of the form 


i * m ■ |%if j 


example, the 


The sum of two such polynomials 

P 1 = a^x 2 + b^x + and P 2 = a 2 x 2 + b 2 x + c 2 
is 

Pi + P 2 = (tf i + a 2 )x 2 + ( b l + b 2 )x + c y + c 2 . 

For example, if 

Pi — 3x 2 + 2x + 1, P 2 = lx 2 — 10x + 2 

then 

Pi + P 2 = 10x 2 — 8x + 3. 

The set of polynomials of degree at most two is also a vector space. Notice that it 
‘looks like’ [R 3 in the sense that the preceding equations look like 




/ ?\ / 10 'y 
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_We wil l return to th is point later,_ 

- Suppr)se that we-are ^iw« a veGtor space K; for example^ F could be R 1 , R 2 or 
R 3 . By an affine space associated to V, we mean a set A consisting of points P, Q, 
etc., and an ope ration “ + ” which assigns to ea ch PeA and each \eV another 


Associative law: 


P“ + ”u)“ + ”v = P“ + ”fu + v) f or any PeA 


‘Zero’ acts as identity: P“ + ”0 = P for any PeA. 

Transitivity: given any two points P and QeA, there is a 

ve V such that P“ + ”v = Q. 

Faithfulness: if, for any P, the equality P“ + ”u = P“ + ”v 

holds, then u = v. 

Combining the last two axioms, we can say that, given any two points P and Q, 
there is a unique vector v such that P“ + ”v = Q. It is then sometimes convenient 
to write v = Q“ — ”P. 

The notion of a vector space and associated affine space lies at the basis of three 
centuries of physical thought, from Newtonian mechanics through special relativity 
and quantum mechanics. The purpose of the present chapter is to develop most 
of the key ideas in the study of these structures by examining the intuitively simple 
case of the two-dimensional* vector snace IR 2 . Let us begin, however, with some 


* We will give a precise definition of the term ‘two-dimensional’ in §1.12, of‘one-dimensional m 
a few lines, and of the general concept of the dimension of a vector space in Chapter 10. 



comments about the one-dimensional case. Here the concepts are so ‘obvious’ that 
a detailed discussion of them may appear so pedantic as to be nnri-in tuiti ve Vet 
it is worth the effort. 

A vector space V is called one-dimensional if it satisfies the following two 
conditions: (i) it possesses some vector v # 0; and (ii) if v ^ 0, then any ueV can 
be wr i tten as n = rv for some real number r. Notice that the r in this equation is 
unique: if 

r 1 \ = r 2 \. 


then we claim that r 1 = r 2 . Indeed, from r x \ = r 2 \ we can write 

(r x - r 2 )v = 0. 

If r x — r 2 ^ 0, then setting s = (r l — r 2 )~ 1 , we have 

0 = sCOi - r 2 )v] = (5(r x - r 2 ))v 
= lv 


= v, 

so v = 0, contradicting our original assumption that v^O. (You should check 
exactly which of the vector space axioms we used at each stage of the preceding 
argument.) Once we have chosen a v ^ 0 in a one-dimensional vector space, then to 

a real number, r, _ 


u 


■r where u = rv. 


If uq — r^v and u 2 — r 2 v, then iq + u 2 = (r x + r 2 )v. Thus iq + u^xorresponds to^ 
r i + r 2 T~$im i larlyT~i f u =rv and t is any real 1number~ then tu=]tr)\ so that7u 
corresponds to tr. In short, every vector corresponds to a real number, and the 


Isom orphism of th e one-dimensio nal vector spac e V with IR 1 . Thi s identification of 
V with R 1 depends on the choice of v. A choice o f v is called a choice of basis of 
V, and the number r associated to u via u = rv is cal led the coo rdi nate of u rela tive^ 


to the basis v. Suppose we choose a different basis, v'. Here v' = a\ where a is some 


non-zero real number. If u 


rv, then 

U = (ra -1 )at? 


so 


u = r v 


where 


r' = a 1 r. 


Thus, changing the basis, by replacing v by a\, has the effect of changing the 
coordinate of any vector by replacing the coordinate r of any vector by a~ x r. The 
choice of a basis in a one-dimensional vector space is much like the choice of a 
unit for some physical quantity. If we change our units of mass from kilograms 
to grams, an object that weighs 1.3 kilograms now weighs 1300 grams. The difference 
is that, for many familiar physical quantities, the measurement of any object is 
given by positive numbers (or zero) only. It usually makes no sense to say th at 
something has negative volume or mass, etc. An exception is in the theory of 
electricity, where electric charge can be positive or negative. For instance, we might 









imagine situations in which we might want to choose the charge of the electron 
ur unit. In terms of this basis, the electron would have charge +1 i 





lonai convention. 


L et A be an affine space associ ated to the o ne-dimen sional vector space V. If 
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x(P), where 

p = 0“ + ’’x(P)v. 

We call x(P) the coordinate of P, but here we had to make two choices: we had 
to choose an ‘origin’ 0, which allowed us to identify points with vectors, and then 
we had to choose a basis of V, which allowed us to identify vectors with numbers. 
If we change our basis, by replacing v by v' = a\, then jc is replaced by x' where 

x'(P) = a 1 x(P). 

If, in addition, we replace O by O', where 0' = 0“ + ”w, then 

p “ - ” O' = (P “ - ” 0) - w. 

If w = bV, then this has the effect o f replacin g x' by x", where now 

x"( P) = a~ 1 x(P)-b. 


concept 


time. Newton wroteT 
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its own nature and independent oflmything external; relative, apparent and 
common time is some measure of duration bv means of motion (as bv the motion 


In our terrain 




hat Newton said is that there exists a concent of absolute 


space. The idea ot flowing evenly and equably is made mathematically more 
precise by the assertion that there is the action, given by “ + ”, of a one-dimensional 
vector space V on the set of all times. It is this postulated action which allows us 
to compare different intervals of time. Newton’s distinction between ‘true’ and 
‘common’ time corresponds to our discussion of the degree of arbitrariness involved 
in introducing coordinates on the affine line. 

We should pause for a moment and ponder over this abstract postulate of 
Newton, which lay at the cornerstone of physics for over two centuries. We have, 
each of us, our own psychological perception of time. Our psychological time 
differs in many important respects from Newton’s absolute time. The first striking 
difference is that for us time has a definite direction. The future is to some extent 
unknown and subject to our volition and intervention, fin many European 



The past is, to some extent, known or remembered. Yet Newton’s laws of 
insensitive to the change of direction of time. If we were to 


mrtmtjii 



planetary system oa , 

laws. The second difference is that our psychological time does not Uow^ 
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time interval: we get hungry a ‘certain amount of time’ after having had our last 
meal. But this is very variable, being determined by the level of our blood sugar, 
which in turn depends on what exactly we ate, what we have been doing in the 
interim, our overall physiological profile, etc. Also, our psychological perception 
of these intervals of time varies greatly. Time passes quickly when we are interested 
and excited by what we are doing, and slowly when we are bored. Nevertheless, 
our internal rhythms appear to be somewhat correlated to periodicities in the 
world about us; from the earliest records of civilization, the measurement of‘external 
time’, whether for civil or for scientific purposes, has always been based on the 
revolution of the celestial bodies. The period of apparent revolution of the sun, 
i.e., the interval between successive crossings of a meridian, has been the usual 
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and similarly the night. These subdivisions were marked off by various devices 





runs downhill. v 
sun on the celestk 


lccurate measure of time 


observing some tixed star: the period betweeniwo successive transits ol some hxed 
star across some meridian line is a sidereal day. A civil day is, on the average, 
about four minutes longer than a sidereal day.) 

The earliest clocks seem to have come into use in Europe during the thirteenth 
century, but were highly inaccurate. The first major step in the improvement of 
the clock came in the seventeenth century when Galileo discovered that the time 
intervals between swings of a pendulum were constant (as measured against a 
normal pulse beat, for instance). He seems to have made little practical use of this 
information, except for the invention of a little instrument for doctors to use in 
measuring the pulse of their patients. His son, however, is said to have applied 
the pendulum to clocks. From then on, the development of mechanical clocks was 
fairly rapid. Thus, it was just around the time of Newton that one finally had a 
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clock is entirely conventional. It is also easily reversible. By a simple change of 
the gearing, we can make the hands rotate counterclockwise instead of clockwise. 
It is interesting to speculate how much the development of mechanical clocks had 
to Ho with Newton’s conception of time. _ 


1.3. Functions and affine functions 

In the next few sections we will study those transformations of AR 2 into itself 
which carry straight lines into straight lines. We must begin with some general 
discussion of the notion of ‘transformation’ or ‘function’. 

Let W and X be sets. A rule f:W->X which assigns one element f(w) of X to 
each weW is called a function (or map, or mapping, or operator ) from W to X. The 
set W is called the domain of /. If A is a subset of W, we let f(A) denote the subset 
of X consisting of the element f(w) where we A: 

f(A)={f(w)\weA}. 

The set f(W ) is called the image of /: in general, it is a subset of X. 

For example, suppose / is the map of R 2 into itself given by f(P) — P“ + ”v 
where v is a fixed vector. Then f(A) is obtained from A by ‘translating A through 
v’. If A = l = {P + tu] is a lin e , th e n /(/) = {P + v + tu} is another line. Thus the 
image of a line under a translation is another line. 



This notion of function is very general and powerful. The only restriction, really, 
is that the ‘output’ of the function must be well-defined. It is not acceptable, for 
example, to have a function /: R -> R with the property that /(1) = 2 and /(1) = 3. 
There would be nothing wrong, however, with a function /: R -> R 2 for which 



Certain standard terminology concerning the domain and range of / is worth 
learning. 

1.-If two distinct elements w l5 w 2 eW are always mapped into distinct points 
* 1 , x 2 eX, then / is called injective (or one-to-one). Equivalently, / is 
injective if /(w t ) = f(w 2 ) implies w, =w 7 . 



xeX, then / is surjectiv e . 

3. Tf/ is both injective and surjective, it is called bijective (or one-to-one onto). 

_Eq uivalently, / is bijective if the equation f (w) — x has a unique so lution w 

for each xeX. In this case there exists a function f~ 1 :X^W, called the 
inverse of /, which maps each xeX into the unique w for which f(w) = x. 
Figure 1.8 may help you visualize why a function must be both injective and 
surjective in order to be invertible. 


(a) 

w 


X = f(W ) 




/ 







(b) 

W 

f 

X _ 



- — 

f 

_ r 


1 






W 2 ^ 


V \ 




/ 

><1* 2 J 1 











(c) 

w 

/ 

X 



_____ 

-—■—* 




W i ^ 


"*Xi 



w 2 

f 

^>x 2 



Figure 1.8(a) Surjective but not injective. Not invertible: F“ 1 (x) would 
not be well-defined, (b) Injective but not surjective. F~ 1 (x 3 ) is not defined, 
(c) Bijective (injective and surjective). F~ l (x 1 ) = w l and F _ 1 (x 2 ) = w 2 . 


In many cases we can describe a function by means of a formula. There are two 
equivalent notations for associating the formula with the function. To describe 
the familiar squaring function F: R -»[R, for example, we may write either F(x) = x 2 
or 

F:x \-+x 2 . 

whichever notation we use, the symbol x is a ‘dummy’ having nothing to do with F: 
the same function is described by 

m = t 2 
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or by 


-jF: aKa 2 .- 

A function described by a formula can involve more th a n one numerical 


argument, for example 


G(x, y) = 2x + 3y 


or 


G: (x, j/)i->2x + 3 y . 

This function G takes the ordered pair of numbers (x,y) and produces the number 
2x + 3 y. 

One further notion that applies to functions is that of composition. Let W, X, Y, Z 
all denote sets, and suppose we have functions 

g:X^Y, 
h: Y->Z. 

We de note the function which takes weW , operat es on it with f t o obtain an 


elem ent of X, th en operates on th at ele ment with g to 
by g°f, called the composition of g with /. More succinctly, (g a f)(w) = g(f(w)). 
Notic e that the composition 


-X Q g- Q f\-W-^-Z- 


(/ followed by g followed by h ) is the same as h°(g°f) or as (h°g)°f. Thus the 
—opera tion of composition obeys an ‘associative law’ just as does multiplicatio n of ] 
TeaJnnunrbersr - 


We turn no w to f unctions o n affine lines a nd planes an d o n vector spaces, 


beginning with one-dimensional examples wh ich, altho ugh important, are so subtle 


that they can easily be overlooked. 


Let A be an affine line, illustrated in Figure 1.9. Given any ruler, we can choose 
PR Q 


Figure 1.9 

an origin and orientation for this line and assign a coordinate to each point on 
the line. Mathematically speaking, we have chosen an origin, 0, of A and a basis, 
v, of V as described in the last section. Thus we construct an affine coordinate 
function 

x: A -> OL 

Of course, there are many possible affine coordinate functions on a line, and which 
one wc construct depends on our origin and unit of measurement. We call x a 
coordinate function because it is invertible: knowing x(P), we can reconstruct P. 
Notice that x preserves the ‘interpolation property’ of a real affine line: if 

R = q-t)P + tQ A 






then 


x(R) = (1 - t)x(P) + tx(Q). 


x(R) = ±x(P) + ±x( 


You have probably never thought of this x as a. Junction before. You cannot 
write a formula for it. Yet you can hardly do elementary physics without it because 
it is what lets you express other functions on a line in terms of formulas. If, for 
example, the force which acts on a particle on a line is a function of position 

/:A->R 

you cannot write a formula for /, but you can introduce an affine coordinate 


and a function F: B? -*■ R and write /(P) = F(x(P)) = ( F°x)(P ). This is what a formula 
like Force = sinx, used to represent a function on a line, really means. 

Time is an affine line whose points are ‘instants’. The affine coordinate function 
assigns a number to each instant. To define t we use a clock. Clocks 
which run at different rates lead to different functions t, but any ‘good’ clock yields 
an affine function. A defective clock Jor example a pendulum clock whose pendulum 


varies in length because of temperature change, would yield a non-affine coordinate 
function. 

The motion of a particle al ong a stra ight l ine d etermines a function from one 
real affine line A t (time) to another real affine line A x (space). This function 
/: A £ -* A x acts on an instant of time E to yield a point P on the line, so that 
P = f (E). W e can not write a formu la for f beca use E and P are n ot numbers. If 



where 


M can be represented bv a formula like F la 


1.4. Euclidean and affine transformations 


A map /: !R 2 -> [R 2 is called a Euclidean transformation if / preserves distance. This 

X X 

means that for any two points P 1 = 1 and P 2 = 2 , the distance from 

LjyJ \_y 2 _ 

/(PJ to /(P 2 ) is the same as the distance from P x to P 2 . If we express / in terms 
of two functions 4>:R 2 ^>U and i j/:M 2 so that 


(j)(x, y) 




for all values of x 1 ,y 1 ,x 2 ,y 2 - 




Euclidean geometry can be thought of as the study of those properties of subsets 
jf the plane which are invariant under the application of any Euclidean trans- 
'ormation. For instance, if A is a circle and / is a Euclidean transformation, then 
f(A) is again a circle. If 1 is a straight line, then /(/) is again a straight line. It is 
;lear from the definition that, if / and g are Euclidean transformations, then g°f 


into straight lines. Thus /(/) must be a straight line for any straight line l. For 
example, suppose / is the transformation defined by 

x _ 2x + y + 1 
_y_ _ y-x + 5 _‘ 

The most general straight line in the plane is given by an equation of the form 

ax + by + c = 0. 

That is, 

/ = | * ax + by + c = o|. 


(/) = j w = 2x+v+l,z=y — x +~5 and ax + by + c_= 0 >7 


But we can solve the equations 


w = 2x + y + 1 
z = y — x + 5 


for x and y in terms of w and 


x=^(w- l)-^(z-5), 


ax + by + c = 0 


can be written as 


fl [i(w-l)-i(r-5)] + fcCi(w-l) + f(z-5)] + c = 0 


or as 


^{a + b)w + (f b — j a)z + c + f a — ^b = 0. 


In other words 


where 


f(A) = 


w 

_z _ 


exv + gz + h = 0 


e = i(o + b). 


lu _ i. 


= c + %a-^b. 



lis is again a straight line. 
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lines (since distinct points go into distinct points). Thus / carries parallelograms 
into parallelograms. Thus the concept of a parallelogram makes sense in affine 
geometry (figure 1.10) (while the concept of rectangle or square does not 
(figure 1.11)). 



1.5. Linear transformations 

The simplest kind of affine (and Euclidean) transformations are the translations 



By a translation we can move any point of the plane into any other point. Before 
proceeding further it is convenient to restrict attention to affine transformations 

, , . r 01 

that keep one point, say , fixed. We can then get to any other point by 
applying a translation. 


Let f be a one-to-one affine transformation which keeps the origin fixed. Choose 



r o~i 


r ~| 


0 = 


as the origin. We can now identifv a point P = 


with its position 


btrj 


LkJ 












r / 

>— f( v \ 


L h 

S V v ) 


r / 

n X ^>/(Q) 





- - 

^ //( w) 






vector v = ^^, so ^ = ^ 

and the distinction betwe 
parallelograms, it follows 
the position vector of f(Q] 

if v # w. We can now sho 
From the parallelogram s{ 

Figure 1.12 

‘ + ”v. We shall, accordingly, drop the [ ] notation 

en AR 2 and IR 2 . Since / carries parallelograms into 
immediately that, if the position vector of Q is v + w, 

• is /(v) +/ (w). Therefore 

/(v + w) =/(v) +/(w), (1.2) 

w that / preserves ratios of segments along any line. 
)anned by w, v, v + w and 2v, we see that 

V + W 


\ 


\ 


/ \ \ 




/ \ v + v = 2v 


/ v 


/ 


_ 








figure 1.13 


/ (2v) = 2/(v) 

so (1.2) holds also when v = w. By repeating the argument, 

f(m) = nf(y) 

for any integer n ^0. Applied to (l/m)v, this implies 

f{a\) = af{y) 

for any rational number a ^ 0. 

From the parallelogram with vertices 0, w, — v + w, — v we see that 

/(-▼)-/(▼) 

so that 

J y ay )— a J tv 

for any rational number, a, positive or negative, and all v. 












carries lines into lines in 
f(av) = af(v) for all real numbers a and all vectors v. The proof of this f a ct is a 
little tricky, and we shall present it in an appendix at the end of this chapter. For 
the moment we shall restrict attention to those affine transformations which do 
satisfy f(a\) = a /(v) for all real a, although, as we said, this turns out not to be a 
restriction at all. For such /, we have the identity 

f(a\ + bw) = af(\) + bf( w) (1.3) 

for any real numbers a and b and for any vectors v and w in !R 2 . 

A map /: R 2 -> R 2 satisfying (1.3) is called a linear transformation of the plane. 
We have converted the study of affine transformations of R 2 which hold the 
origin fixed into the study of linear transformations of the vector space R 2 . 

Any map of R 2 -»R 2 satisfying (1.3) is linear, by definition. Not every linear 
transformation is one-to-one. For example, the transformation which maps every 



for all 


/(v + tw) =/(v) + tf( w) . 

is also one-to-one, then w # 0 implies /(w) # (F 
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is called regular or non-singular. A linear transformation which is not one-to-one 
is called singular. We have seen that every regular linear transformation is affine. 
We shall see that the singular ones collapse the whole plane either into the origin 
or into a line. 

It is clear that if / and g are linear transformations (regular or not) then g°f is 
again a linear transformation. Indeed, {g°f){a\ + bw) = g{af (v) + bf( w)) since / is 
linear. Since g is linear this equals ag°f(\) + bg°f{ w) which shows that g°f is linear. 

To summarize: Linear transformations are, by definition, those / which satisfy 
(1.3) for all pairs of vectors v and w and all real numbers a and b. An affine 
transformation is a one-to-one map of R 2 into itself which carries lines into lines 
Any affine transformation can be written as a (non-singular) linear transforma¬ 
tion followed by a translation; that is, any affine transformation / satisfies 

/ (w) - /(w) + v 






e matrix of a linear transformation 


Let f be a linear transformation. We can write am 


in the plane as 


= * „ )+y 


/ C) =x/ ( o) +yf G 


This formula shows how / is completely determined by what it does to the two 
basis vectors and Suppose that = ^ and /^ = Q^. 

Then / is completely determined by the four numbers a , b, c, and d, 


f( x )=( ax l b / 

\yj \cx + dy 

We write these four numbers as a square matrix 
- Mat (/) — (; * 


where the first column is the image of ( ) and the second column is the image 


'he image of any poii 


is then given by 


_c _ d 


ax + by 
cx + dv 


by the matrix f j, to give another vector. It says to take the row x column for 

V d J 

each of the two components. Thus the top component is ax + by which is obtained 


from the top row (a,b) of the matrix and the column y J. Similarly for the 
bottom component. 

For example, suppose that R e is counterclockwise rotation of the plane through 
angle 6. Then 

_ /l\ /costA 


0 / \ sin 6 


'0\ r-sim 







so that R e has the matrix 


/ cos 0 — sin 0 

ysin# cos0 


The image of any point 



is given by 


/ cos 0 
\sin 0 



f (cos 0)x — (sin 0)y\ 
\(sin 0)x + (cos 0)y J 


The formula (1.4) shows how to assign a linear transformation to each matrix. We 
can thus identify 2x2 matrices with linear transformations of R 2 . 


Suppose that F is a 


a b 
x _ <L 


and G is a 


_linear 



°G is again a linear trans- 


whose first column is 


7Tv 




= T1 


a b 


ae + bq 


w 


, 9 , 


dj\gj \ce + dg 


The second column is 


(F°G) 




fa b\ff\_f af+ bh\ 
Vc dj{hj \cf+dhj- 


Thus we define the ‘multiplication’ of matrices to correspond to composition of 
linear transformations, (Mat F ) x (Mat G ) = Mat ( F°G ). The rule for multiplication 
is 


fa f\_fae + bg af+bh\ 

\c d)\g h) \ce + dg cf+dh) 

For any position in the product matrix we take the same row from the first matrix 
and the same column from the second matrix and multiply row by column. 




For example, if R e is (counterclockwise) rotation through angle 6 and R d is 
rotation through angle <p, then R 0 R 6 = R e+C , and 



( cos y — sin y ^ 

1 y j 

f cos (p —sin (p ) 



^ sin 6 _cos 9 J 

! A 1 

^sin (p _ cos (p ) 



cos 9 cos (f) — sin 9 sin (p— cos 0 sin (p — sin 6 cos (p' 
sin 9 cos 4> + cos 9 sin (p cos 0 cos </> — sin 9 sin <p 


Comparing this with the matrix of R e+(j) 

(cos (9 + <p) — sin (9 + <p)\ 

\sin(0 + <£) cos (9 + (f))) 

gives the standard trigonometric formulae for cos (9 + (p) and sin (0 + </>). Thus you 
need no longer remember the identities for the sine and cosine of the sum of 
two angles. You can derive them from the more general rule of matrix multiplication. 
Notice that matrix multiplication, in general, is not commutative: for example, 

( i 2 V 4 °w 4 i( b 

\0 3;\0 5 ) \0 15 / 

while 

(A oW: 


0 \ / 1 2 


8 


yo syyo 37 yo 15^ 


(Two rotations of !R 2 do commute with one another since it does not matter through 
which angle we rotate first. But, in general , two matrices need not commute.) 


As an illus trat ion of matrix multipli cati on, we prove a ‘triple product decom¬ 
position’ which will be used later on. This decomposition states that any matrix 
IT 


a 




of th e form 


a b 


1 OVr OVl 


x 


y i/yo \Q 1 




To prove this result we simply devise a procedure for determining y, r,s, and x. 
We first multiply the matrices on the right. Since 



we want 



Now we can equate corresponding entries in the left-hand and right-hand matrices. 
First, a = r, and since by assumption a # 0, r ^ 0. Next, b = rx and so x = b/r = b/a 
(remember that a =£ 0). Similarly, c — ry and so y = c/r = c/a. Finally, d = rxy + s 
and so 


- rxy 


= d — ( bc)/a . 






A similar decomposition, important in the analysis of lens systems, is 


/ a 

b Wi 

if°-f 



\ c 

d) \0 ij 

°y 

tXxr 

V 


valid for any matrix with c # 0. The proof of this decomposition is simpl e : again, 
just multiply out the triple product and equate corresponding matrix entries on 
both side s of the equation. 


1.8. Matrix algebra 


Let F and G be two linear transformations of [R 2 . We define their sum by 

(F + G)(v) = F(\) + G(v). 


Notice that 


(F + G)(av + bw) — F(a\ + bw) + G(a\ + bw) 

— aF(\) 4- bF( w) + aG(\) + bG( w) 
= a(F(\) 4- G(v)) + b(F( w) + G(w)) 
= a(F + G)(v) + b(F + G)(w). 


F + G is again a 


is clear that this addition is 


assoc iative and commutative, that the zero transfo rmation, 0(v) = 0 , f or all v , is 
the zero for this addition and that (— Fj(v) = — F(v) defines the negative of F, i.e., 

(— F) + F = 0, where 0 in this equation stands for the ze r o linea r t r ansfo r mation. 
If H is a third linear transform'atidh~ then composition, represented by matrix” 


for all v, or, in short, 


H°(F + G)(v) = H[(F+G)m 
= HlF(v) + G(v)l 
= WF(v)l + H[G(v)T 
= (HoF)(v) F (H'>G)(v) 


and, similarly, 


H°(F + G) = H°F + H°G, 


{F + G)°H = F°H + G°H. 


Thus multiplication is distributive relative to this addition. 

It follows directly from the definition of the sum of linear transformations that if 
the matrices of F and G are 


Mat (F) = y 

: b d) and 1 

(; 0 =Ma,(c) 
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/ a + e b + f\ 


Mat {F + G) = 


= Mat (F) + Mat (G). 





In other words, we add matrices by adding the entries at each position. We can 

. . , (a h\ (2a 2/A 

also multiply a matrix by a number: 2 , = . Notice that 



fa b\ 

(2a 2 h\ 


(2 0\ 

/ a b\ 

2 

„ A 


- OA 

= 

A-0 

s. A 


ft WES iKV/i 


1 1^*1* 


It [VIIII [Vi t vs It [tllirill>ill[fUllllMI]|[*BII[llfllll 
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rules for addition and multiplication satisfy most of the familiar rules for adding 
and multiplying numbers. Thus: 

Addition is commutative and associative with the existence of a zero and a 
negative; 

Multiplication is associative with the existence of an identity, Q ^ J, and 

is distributive over addition. 

There are, however, two important differences: 

(1) multiplication is not commutative; 

(2) the product of two non-zero matrices can be zero, so the cancellation law 
for multiplication need not hold: 



J 41173 rasM 
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Hi iiMtran 


Instead of linear transformations of the vector space (R 2 , we could consider linear 



transformation of R 3 is now described by a 3 x 3 matrix of the form 


11 

a l2 

a 13 

21 

a 22 

a 23 

31 

a 32 

fl 33 


where the first column is the image of the first basis vector I 0 1, etc. The formula for 

V 

multiplication is 


11 

a 12 

a \3 S 

\ ( bl1 

b 12 

b\ 

f C \ 1 

C 12 

Cl 3 

21 

a 22 

rt 

a 23 

/I 

X U 21 

^22 

h 

b 23 ] 

h ) 

hb 1 

C 22 

C 2 3 

31 

w 32 

“33 J 

1 \°31 

°32 

°33/ 

\ C 31 

c 32 

c 33 


where, for 


c u — a nb lj + a i2 b 2j + a r . 



Again, for any position, the row from the first matrix multiplies the column from the 
second. Thus, for example, taking i = 2 and / = 3 in the above for mul a corresponds 
to the diagram - 






ill 




n 







1 \ 

1 

1 

*33 / 

' 



*22 *23 

l 1 



23 

i 



Figure 1.15 


The law for addition is again positionwise addition. The various associative, 
distri butive laws app ly as before, as does the commutative law for addi tion. 


Equally well, 
we can multiply a vector in R 3 by a 3 x 3 matrix: 


- ax + by + cz\ 


f 11 y I = \dx + ey+fz 
jj h ijxfj \ gx + h y + izj 




1.9. Areas and determinants 

We return to the plane. Let / be a non-singular linear transformation of the plane. 
Since /(v + tw) =/(v) + tf (w), we know that / carries lines into lines and hence is an 
affine transformation. Thus / carries squares into parallelograms. Furthermore, let 
□ v be the unit square whose left-hand lower corner is at v, 


□v = 



0 ^ s < 1, 


0^t< 1 


= {v + w|weD 0 }. 

Then the image of □„ under /, which we denote by /(□„), is just a translate of 
the image of under /: 


/(□ v ) = {/(v) +/(w)|we □„} 

= {/0) + u|ue/(n 0 )} 

and thus/(Q y ) has the same area as /([Z] 0 ). The same clearly holds if we consider a 
squar e of any siz e , not n e cessarily th e unit square. On the other hand, we can 







is a number which is independent of □ (and of the size 2 fc ). Let us denote this 
number bv Ar (/), so that 


Ar(/) = 


area •(/(□)) 



Figure 1.19 



Figure 1.20 


we can app roxim ate it by a union of squares (and its im age by the image 
parallelograms) so that - 


Ar (/) 


area / ( D ) 
area D 


for any (nice) region. (Strictly speaking, we should approximate it from the inside 
and the outside. If we assume that we can cover the boundary by a finite union 
of small squares whose total area can be made as small as we like, then the total 
area of the parallelograms covering the image of the boundary will also be as 
small as we like. Hence the approximation is legitimate. This is the meaning of 
our qualification that the region be ‘nice’.) 

Thus Ar (/) gives the factor which tells us how area changes when we apply /. 
If / and g are two non-singular linear transformations, 









i\ m n+. 


can be obtained troi 


to each vertex j, the image of the unit parallelogram has the same area as 

the unit parallelogram. That is, area is unchanged by this shear transformation. 
Ar (/) = 1 in this case. 


Case 2b: / is represented by 


1 0 
y 1 



(?)/k 




Figure 1.22(b) 

i gain th e two shaded triangles have the same area, and so Ar (/) = 1 in this 


For any matrix F = 


c d 


)et F = ad — be. 


We wish to prove the basic formula 

Ar( / ) = lDetF l , (1.7) 

We have proved this formula for each of the three kinds of matrices listed above. 
To prove it in general we make use of the following important property of 


Det (F° G) = (Det F) x (Det G) 

(e f\ 

which can be verified by direct multiplication: if G = I ^ J then 


Det ( F°G) = Det 


ae + bg af + bh 


~j-^y ce + dg cf + dh j 

= {ae + bg){cf + dh) - {af + bh){ce + dg) 

= {ad — bc){eh —fg ) = (Det F) x (Det G). 

From the two rules Ar(/ °g) = (Ar/) x (Ar g) and Det(F° G) = (Det F ) x (Det G) 
we conclude that the formula 





is true for any matrix that can be written as a product of matrices for which we 
already know the formula to be true. We proved in section 7 (equation (1,5)) that 



fa b\ 


(A Q\ 

(r 0\ 

/I 



Ce d, 


W l) 

VO s) 

VO 1 y 



We have thus proved the formula for all matrices with a=£ 0, To deal with the 
ease a = 0 we can proceed in either of two ways: 

(i) Direct verification: 

Ar (° d-w-k: 3' 

(Details of the proof are left to the reader.) 

(ii) Continuity argument: We can notice that both Ar / and Det F are continuous 
functions of the entries of F (i.e., if we change the entries slightly, the values of 

Ar/ and of DetF change only slightly). Now, if ^ M is non-singular, so is 


e ] for sufficiently small e ( indeed Ar [ 
c d) V \c a 


is non-zero I. Thus, since we 


mow that the equality 


-e — 

c d t 


-=. Det 


true for all e close to zero, we conclude that it is true for e = 0 as well. 

We sho uld point out the significance of th e sign of DetF when D etF # 0. (We 
ive given a meaning to its absolute value.) The meaning, at present, is best 


formation 


0 -1 


is a reflection ab out t he 


//* \\ 

If W 

\ :l 


Figure 1.23 


I t has the effect of switching counterclockwise rotation into clockwise rotation . 
Thus 






Inverses 


33 


equation shows that F cannot have an inverse if DetF = 0: if FG = Q 


we see from the above that 


DetF 0 


G = !Det F)G 


certainly impossible to find a G such that FG = 
F a = (DetF)G, shows that if DetF#0 then 


. The same equation. 


= Det 


is the inverse of F. 


e formula does give th 


inverse of F. We have thus proved the following theorem: 




inverse matrix, F 1 has the formula 


F -i = - 


-4- —y 
Det F Det F 


Det F Det F J 


We should understand the geometric meaning of F 1 ; it ‘undoes’ the effect of 
F. If we apply first F and then F~ 1 then we are back to the identity transformation. 

We see also that FF -1 = ^ (by direct multiplication if you like). 

1 10 nin rtiilnt* i + a r-., — IT™ 


F~ 1 F\ = F~ 1 Fw or v = w for any F having an inverse. 

It is reasonable that the condition Det F = 0 corresponds to the singularity of 
F in view of the interpretation of Det F in terms of area. Indeed, suppose that the 

p a rallelogram spanned by the origin and and has non-zero area, 

meaning that and do not lie on same straight line through the origin. 

This means that Det ^ ^ # 0, since Det ^ is the area of this parallelogram. 
In this case the inverse matrix exists, so we can write 


T ,. . -T : - /l\ (a\ fb\ ^ f0\ (a\ , , 

ims is the same as saying that =e +g and =/ +n 
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Linear transformations of the plane 


f course implies that we can express anv vector in the plane as a linear 





the matrix and consider the vector 
R 2 and hence we can write 


11s is a wel 


lehnecTvector in 


_ TLA. ~ 1 ... 

(a) 

_J 


, L| 

m 

M W = 1 

LiJ 

a 

LdJ 

\ + b \ 

Lli 


If we apply M to both sides of this equation, we get 


w = w = al 


= au + b\. 


Thus if m and v are linearly independent, every vector in the plane can be written 
as a linear combination of u and v. Conversely, suppose that u and v are vectors 
such that every vector in the plane can be written as a linear combination of u 
and v. Then u and v clearly cannot lie on the same line through the origin, since 
this would imply that every vector in the plane would have to lie on this line. Thus 


plane can be written as a linear combination of u and v. 

Suppose that F is a matrix and Ui and u, are any pair of linearly independent 




Inverses 
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The parallelogram spanned by u t and u 2 has non-zero area, hence 
T(u 1 ) and F(u 2 ) will be linearly independent if and only if Det F ^ 0. 


fa 

(1) F = I A has an inverse; 

(2) D et F ^ 0; 

(3) the vectors and do not lie on the same line through the origin; 


(5) for some pair u l5 u 2 of vectors, the vectors F(u 1 ) and F(u 2 ) are linearly 


i i i i 
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and F(v 2 ) are linearly independent; 


If J (HI 



Let us use the preceding considerations to illustrate some reasoning in affine 
geometry. We first remark that, in affine geometry, not only does the length of a 
segment make no sense, but also the comparative lengths of two segments which 
do not lie on the same line make no sense. Indeed, if u and v are two independent 
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v —»sv for any non-zero numbers r and s. Thus, by adjusting s/r, we can make the 
ratio of the lengths of /(u) and f{\) anything we please. On the other hand, the 
ratio of lengths of two segments lying on the same line does make sense. Indeed, 
since translations preserve length, we may assume that the line / and Its image 
f(l) both nass throush the oriein. Since rotations nreserve lensth. we mav apply 


a rotation and assume that /(/) = /. But then if 0 ^ u c= /, the image /(u) also lies 
in / so /( u) = cu for some constant c and hence / (v) = cv for any v c /. Thus / 
changes the length of all segments on / by the same factor |c |. 

We should also point out that given any two triangles and A 2 there is an 
affine transformation, f * with f(A 1 ) = A 2 . Indeed, by translating, we may assume 
that one of the vertices of A, is the origin. Let u, and v, be the two remaining 




* 1 «Ti ■ 1 M 




of a triangle so do not lie on 
A 2 are 0, u 2 , v 2 . But then there 

a line. Similarly we may assurr 
is a unique linear / with f{u lj 

le that the vertices of 
) = u 2 and /(vj = v 2 . 



the three lines joining 


This is an assertion in affine geometry - the notion of midpoint makes sense, 
as does the assertion that three lines meet at a common point. To prove this 





warn 


theorem, it is enough to verify it for a single triangle, since we can find an affine 
transformation carrying any triangle into any other, and, if the theorem is true 
for one, it must be true for the other. But the theorem is clearly true for equilateral” 
triangles. So we have proved the theorem in general. 


Let us examine what can happen when Det F = 0. There are two alternatives: 


either F = 


or F 


If F is the zero matrix, then F maps every vector into 0; it collapses the whole 
plane into the origin. In the alternative case where F is not the zero matrix, but 
Det F = 0. we claim that the following two assertions 


(i) there is a line, /, such that F(u)e/ for every u in !R 2 . Furthermore, every \el is 
of the form v = Flu). In other words F collapses the plane onto the line /. 


I ■ I ■ T 
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collapses k into the origin, and does not send 
Let us prove assertion (i). Let 




fI 

n 

C 1 -\ 

UJ 


L 


be the two columns of F = 


w. 

a b 


and e, = , I = F 


Since F is not the zero matrix, and c 2 can not both be equal to 0. On the 
other hand, if c x and c 2 did not lie on the same line, then by the equivalence of 
assertion (3) and assertion (1) of (1.9) (on the preceding page) we would conclude 
th at De tF ^ 0, contrary t o our assumption. Thus an d c 2 lie on a lin e. Call th is 
line /. Every vector u can be written as 



( x\ 

f\\ 




u — 

hrj 

= X 



[1) 

_ 


so F(u) = xCi + yc 2 lies on /. If # 0, every vg/ can be written as v = xc x for some 

/x\\ „ . . . . . 


for some number y and hence v = F^ J J. This completes the proof of (i). 

Let us now prove (ii). Let b, and b-, be the first and second columns of the 


b‘ = (-c) and b > = ( a} 

Since Det F“ = Det F = 0, we know from (1.9) that b, and b, must be linearh 


riratiaim 


I J I 


nail 


hypothesis. So they span a line. Call it k. Direct computation shows that 

TxY 

ph = 0 = F b 2 so every w ek satisfies Fw = 0. If w satisfies Fw = U then w 

1 - \yJ 

must satisfy the equation 

ax — by = 0. 

This is the equation for a line, unless a = b = 0, i.e. b 2 = 0, and this line must then 
be k. If b 2 = 0, then c and d can not both vanish. But w must also satisfy 

cx — dy = 0, 

and this is the equation of a line, and the line must be k. This proves (ii). 

For any F whatsoever, let im(F) denote the subset of R 2 consisting of all elements 
of the form F(u). In symbols, 

im (F) = {v | v = F(u) for some u}. 

The set im (F) is pronounced as ‘the image of F\ Similarly, we define the ‘kernel 




kerfF) = fw| f(wi = 0] 


iere are thus three possibilities: 



ker (F) = k is a line. In other words both im(F) and ker(F) are one- 


HUIBIIfll 






If we think of (01 as beine a ‘zero-dimensional’ vector space we see that in all 


V J —' - 

cases we have 

dimension of im(F) + dimension of ker(F) = 2. 

Special kinds of singular transformations. We now examine some special kinds of 
singular transformations. 



image 


if v is a vector in the image of p, p(\) = v. 

This is rather special, since all we can expect in general for a singular trans- 
formation f is that f(\) lies on the same line as v; i.e., /(v) — av for some number 
a. Now, if w is an arbitrary vector, v = p(w) is in the image of p, and 

_(P°P)(w) = p(v) = v = p(w)._ 


P 2 = P. Thus 


'a b\(a b ' 
cd)\cd 


/ a 2 + bc ab + bd\ fa b 
ac 4- cd bc + d 2 I \ c d 


1 e trace of 


So ab + bd = b, and if b^ 0, a + d= 1. Furthermore, ac + cd = c, so if e^O, 
a + d=l. Even if b = 0 and c = 0, we have a 2 =a, d 2 = d, and Det P = ad = 0, so 
either a= 1, d=0 or <2 = 0, d = 1, or P = 0. Therefore, unless P = 0, the trace of 
P defined as tr P — a + d must equal 1. 

To summarize: a non-zero (singular) projection p satisfies p°p — p; its matrix P 
satisfies P 2 =P and has zero determinant (ad — bc = 0) and unit trace (a + d = 1). 

Conversely, suppose that ad — be = 0 and a + d = 1. Then ab + bd = (a + d) b = b 
and ac + cd = c, while a 2 + bc = a 2 +ad = a, and be + d 2 = d. Thus P 2 — P and p 
is a projection onto a line. 

More generally, l et us call an operator p a pro j ectio n if p 2 = p.^Then there are 

(1) P is non-singular. In this case, we can multiply the equation P 2 = P on both 
sides by P~ x to obtain 


0 1 




miniinimiii 




U 2 onto a line and is the identity when restricted to this line. Here tr P 
1 = dim (im (/?)), where we write im for image. 


(3) P = 


so P maps the whole plane to the origin and trP = 


0 — dim (im ( p )). 


2. Nilpotents 

For a general singular matrix, the two lines imF and kerF will be 




ir kv«vj ■ m Li L«91 
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■J IfJ 11 vf | si 


that its image and its kernel are the same. Applying n to any vector w yields a 
vector v = n( w) which is the image of n and hence also in the kernel. It follows 
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that n(v) = n°n(w) = 0. Thus n°n collapses the entire plane into the origin, and the 
matrix N representing n must satisfy N 2 = 0. So 

/ a b\f a b\ fa 2 + be ab + bd\ 70 0\ 

\c dj\c dj \ac + cd bc + d 2 ) \0 0/ 

In particular, ab + bd = 0 and ac + cd = 0, so if b ^ 0 or c # 0, then a + d = 0. If 

be = 0, then a 2 = d 2 = 0 , so N = ^ I n every c a se, then, N has zero trace. 

Conversely, if trN = a + d = 0 and det N = ad — be = 0, then N 2 = 0. 

Let us call a mat r ix nilpotent if some powe r of it vanishes. Thus N is nilpotent 
N k = 0 for some k. For such a matrix, we must have det AT = 0, for otherwise 
we could keep multiplying the equation N k = 0 by N ~ 1 u ntil we get N = 0. Thus 
ad — bc = 0 and hence 

- La b\ 2 /( a + d)a (a + d)b\ - 

\c d) \(a + d)c (a + d)d)‘ 

Then 

jk _f( a + d) k ~ l a (a + d) k ~ 1 b\ 

\(a + d) k ~ 1 c (a + d) k ~ 1 d)' 

This can only vanish if (a + d) — 0. But then, we already know that N 2 = 0. Thus, 
in the plane, a matrix N is nilpotent if and only if N 2 = 0 and this holds if and only if 

det N — 0 and tr N = 0. 


1.12. Two-dimensional vector spaces 

A vector space V is called two-dimensional if we can find two vectors, and u 2 in V, 
such that every veF can be written uniquely as 

v = a 1 u-l + fl 2 u 2 

The word ‘uniquely’ means that if 

v = a 1 u 1 +fl 2 u 2 and v = h 1 u 1 +h 2 u 2 

then we m ust have 


a 1 =b 1 and a 2 — b 2 . 


An ordered pair, u 1? u 2 of vectors with the above property is called a basis of the 
vector space. Such a choice of basis determines a map L= L Ui „ 2 of V onto R 2 

V^U 2 


by 


L(v) = if v = a 1 u l + a 2 u 2 . 

The ‘uniqueness’ part of our assumption above guarantees that the map L is 
well-defined; the components a x and a 2 are completely determined by v. The map 




( a A 


[aA 

is also onto: given 

uJ 

the vector v = a 1 u 1 + a 2 u 2 clearly satisfies L(v ) = 



v = a 1 u 1 + a 2 u 2 and w = o, u 


l/juj t u2^2 then 

= a 1 u 1 + a 2 u 2 + b i u 1 + b 2 u 2 

= (fli + &! )ui + (a 2 + b 2 ) u 2 


L(v + w) = L(v) + L(w) 

and similarly 

L(rv) = rL(v) 

for any real number r and any vector v of V. We say that L is an isomorphism* of 
V with IR 2 . It allows us to identify all operations on and properties of the vector 
space V with operations on and properties of ER 2 , just as in the one-dimensional 
case, a choice of basis allowed us to translate properties of a one-dimensional 
vector space into those of IR 1 . Of course, just as in the one-dimensional case, the 
isomorphism, L, depends on the choice of basis. Thus, the choice of basis, {u 1 ,u 2 } 
is the two-dimensional analog of a ‘choice of units’. Only those properties which 
are independent of the choice of basis will be interesting to us and of true geometrical 
character. We"shall shortly study how L changes with a change of basis. Fob the 
mo m en t , let us o b serve that t h e basis { u 1 ,u 2 } can be re co ve red f rom L. Ind ee d 


_ _ _ T -1 

m 


f 0\ 

U 1 — F 

LO J 

ano U 2 — 

ui 


So giving a basis is the same as giving an isomorp hism, L:V-> IR 2 . Given L, simple 


in UT car 



oi and u 2 , and the isomorphism associated with {u x , u 2 } is clearly L. 

A linear transformation F: V-> V is a map of V into V which satisfies our usual 
identity: 

F(au + b\) = aF{ u) + bF(\). 

A choice of basis gives an identification L: K-^IR 2 and we can define a linear 
transformation of (R 2 by 

LFLT 1 . 

Here L -1 : IR 2 -► K, then F.V^V and L:K->[R 2 . It is best to visualize the 
situation by a diagram: 


* In mathematics, the word isomorphism means a one-to-one mapping which preserves all the 
relevant structure. For vector spaces, V and W, we say that a map L from V to W is an 
isomorphism if it is linear, is one-to-one and onto. 
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The transformation LFL ~ 1 going from R 2 ->R 2 along the bottom is obtained by 
going up, across and down. Now any linear transformation of IR 2 -> R 2 is given by a 
matrix. Thus, once we have chosen the basis L, we have associated a matrix 

Mm (F\ — Mm(7 FT~ 1 t 


to any linear operator F: V. If G: V Fis a second linear transformation, then 

LGFLT 1 = LGLT 1 LFLT 1 

OQ 

Mnt (Go F\ - Mnt (F\ 

- iVI a r ) — iVIaljr^w jiVId-ljr^^P ). - 

In other words, composition of linear transformations goes over into matrix multi¬ 
plication. Similarly for addition of linear transformations. Thus the algebra of 
linear transformations on V gets translated into the algebra of 2 x 2 matrices. 

/i\ 

I'he space R is itsell a vector space. It has a natural basis consisting of 

and If /: R 2 -*■ R 2 is a linear transformation, its matrix relative to this natural 

basis is the matrix F, in the language of the preceding few sections. The map 

L:R 2 -*■ R 2 = V corresponding to this basis is just the identity /. Thus the relation 
between / and F should be written as 

F = Mat,(/). 

From a strictly logical point of view we should have used the notation Matjr(/) 

instead of F from the very beginning, but it would have been too cumbersome. 
From now on, once we have the idea of a linear transformation on a general vector 
space, we shall drop the distinction between lower case letters and upper case letters. 

The assignment of Mat L (F) to F does depend on an artifact, namely on the 
choice of basis. We now must examine what happens when we change the basis. 

So suppose that we are given two bases. This means that we are given two isomor¬ 
phisms, L: F->R 2 and M: F->R 2 . Then we can consider the matrix B = ML~ 1 : 
R 2 -» R 2 , so 

M = BL. 

We can visualize the situation by the diagram: 

!/\ 

IR 2- —-——► R 2 

B 

The matrix B is called the ‘change of basis matrix’ (relative to the bases L and M). 

It is the two-dimensional analog of the factor 1000 by which we have to multiply 
all numerical values of masses when we pass from kilograms to grams in our 

choice of unit. To repeat: L(v) and M(v) are two points in R 2 corresponding to 


the same point v in V by the two choices, L and M, of bases. These two points in 
are related to one another by the change of basis matrix: 


M(v) = BL(\). 
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The matrix B gives an isomorphism of R 2 -»R 2 . i.e. it is non-singular. (It is clear 
that, if we are given L and also given a non-singular matrix B, then we can define 
M = BL, and this M is an isomorphism of F with R 2 . Thus, once we have fixed 
som e basis, L, of F, the set of all other bases is parameterized by the se t of all 
invertible 2x2 matrices, B .) 

Now suppose that F: F-> V is a linear transformation. Then 


and 


Mat m {F) = MFM~ 1 . 

But MFM ~ 1 = (BL)F(BL) ~ 1 = BLFLT 1 B ~ 1 = B(LFLT 1 )B ~ 1 , so 



This important formula tells us how the two matrices of the same linear transforma- 
tion are related to one another when we know the change of basis matrix, B. 

For a given linear transformation, F: F-> V, it may be possible to choose a basis, 
L, so that Mat L (F) has a particularly convenient or instructive form. For example, 
suppose that F: F-> F sends all of F onto a line and sends this line into 0, in other 


7 . Let us VllV/WdU 1*2 

not belong to ker F and set u x = F{ u 2 ), so Uj ^ 0 and F(u l ) = 0. We take u 1? u 2 as 


our basis. Then LFL 


-i i 


1 

0 


= LjF(u 1 ) = L(0)= and LFL 


•-ii 


0 

1 


= LF(u 2 ) = 


1 


L(uJ = F J. So, for this choice of basis we have 


_ Mat L (F) = ^ ° M._(UOI 

Now in this entire discussion, there is nothing to prevent us from considering the 
case where our vector space, F, happens to be R 2 itself. When we identified a linear 
transformation with a matrix, it was with respect to the standard basis. In other 
words, when we wrote F in sections 1.5 and 1.6, it should have been written 
as Mat / (F). So, for example, let N be a non-zero nilpotent matrix. Thus 
N = Mat / (F), where F is a linear transformation of R 2 with ker F = im F. (In words 
we would say that N is the matrix of the linear transformation F relative to the 
standard basis.) From the preceding considerations we know that we can find 
some other basis, L, relative to which (1.10) holds. By the change of basis formula 
( the change of basis from L to I) we know that 

oK 1 - (ui) 

We have thus proved: given any non-zero nilpotent matrix N , we can find an 
invertible matrix, B such that (1.11) holds. We shall return to these kinds of 
considerations (and, in particular, how to find B) in the next chapter. 

For an important application to physics of the results of this chapter please 




- Appendix - 4-3 

turn to Chapter 9. There we show how Gaussian optics is really the study of 2 x 2 
matrices. Most of Ch a pter 9 c a n be re a d with only a knowledge of Chapter 1. 

Appendix: the fundamental theorem of affine geometry 

We wish to prove the following: 



fo\ 


(0\ 

Let f be an affine transformation of R 2 satisfying f\ 

f u 1 

[oi 


j. Then f is 


linear. 

In proving this theorem, we can make a number of simplifying reductions. Notice 
that, if g is an invertible linear transformation, then g°f is linear if and only if/ is 

linear. Now /^M an ^ f(j^) cannot on same ^ ne through the origin. They 

are thus linearly independent and hence we can find a linear transformation g with 


go/ (o ) = ( o ) andg ° / ( (i) ) = ( i 


I. Thus, replacing / by g°f, it is enough to 


prove the following: 


'O' 


'O' 

0 


/ 


Let f be an affine transformation satisfying /I 

\ \ / / \ / 

and . Then f is the identity transformation. 


' 1 ' 

0 


0 


1 


1 


Proof. From section 1.2 we know that / 






0 


+ 


0 


1 


0 


0 

1 


1 


In fact, we proved that/ 


r 
s. 


whenever r and s are rational. 

o : 


Thus/ carries the x-axis, the y-axis and the line x = y which is the line through 


0 


and 


into themselves. 


Thus, for any real number a 


fi 

b 

fa) 

A (m\ 

1 _ 



{Oj 

71 0 > 

1 


where / is some function. (We want to prove / (a) = a for all a.) Similarly/| 

( 0 


0 

b 




for some function 1 J/, and since 



we have 









[bl ) 

— 


• 


We claim that the functions 4> and \j/ are the same. Indeed, consider the line x = a. 


the line x — y at the point f j, and the line x = <f)(a) intersects the line x — y at 


Hence 


and so <fi{a) = \J/(a) for all a. 


Figure 1.25 


4>{a + bp= (p(a) + (f>(b). 

All of this is essentially the same level of argument as in section 1.2. We now 
establish the surprising fact that 

(j){ab) = (j){a}(t){b). 

Indeed, consider figure 1.26: 

_. m (b \ . fa\ fab\ _ 


The line joining to is parallel to the line joining to ^ ^ Thus 
the value ab can be obtained by parallels and intersections. Therefore, drawing 
the same diagram for ^ and f ^ ^ we see that 



( (t>(ab)\ 


( ab\ 





i 0 J 

=/ 

voJ 


\ 0-J 






so 

( p(ab ) = (j)(a)(j)(b). 

Now a real number x is positive if and only if x = y 2 for some other number y. Then 

<£(*) = 0 ( y 2 ) = 0(^) 2 

so 


x > 0 implies > 0 . 


Thus (i — h -'*■ o implies — d'(b )0 Thus if 


r <a<s 

then 


W) < (p(a) < (p(s). 

Nnw fr>T* finV T*Pfll rmmhp** /I nan ftnrl rafmnal rtnmkAro v cinH c w/itli v ^ n ^ c 

1 ^ V VV IV 1 U11J 1 WU1 11 Lilli uwi w w w vein 1111VI 1 11 ill/llCll ll Ulll u w 1 o / LlllVl J VV 1 111 / "-s. w ^ o 

and s — r as small as we please. But, for rational numbers, <p{r) = r and <p{s) = s. Thus 


i X (Jsytlj ^ j. 

Hence <2 — <ft(q)| <s — r. Since s — r can be chosen arbitrarily small, this implies 




Summary 

A- Transformations of the plane 

You should be able to define the terms affine transformation, linear transformation, 
and Euclidean transformation. 

You should be able to identify geometric properties that are preserved by affine 
transformations and properties that are preserved by Euclidean transformation. 


° Matrix algebra 

You should know how to add and multiply two square matrices of the same size. 
You should be able to calculate the determinant of a 2 x 2 matrix and to write 





Linear transformations of the plane 


_q Matrices and linear transformations 

Given sufficient information about a linear transformation of the plane, you should 

be able to write down the 2 x 2 matrix that represents the transformation. 

You should understand the significance of matrix multiplication in t erms of 
composition of linear transformations and be able to apply this relationship. 

You should be able to determine the image and kernel of the transformation 
represente d b y a given 2 x 2 matrix. 

You should be able to identify 2x2 matrices that represent transformations with 
special properties (rotations, reflections, projections, nilpotent transformations). 


Exercises 

1.1 Here are some theorems of Euclidean plane geometry. Decide whether 
each is a valid statement in affine plane geometry. 

(a) The medians of a triangl e meet at a point wh ich is 2/3 of the way from 
each vertex to the midpoint of the opposite side. 

(b) The angle bisectors of an isosceles triangle are equal in length. 

(c) The diagonals of a rhombus are perpe ndicular. 

(d) Th e dia gonals of a parallel o gram bis e ct e ach oth e r ; 

(e) Let PQR and P'Q'R' be two triangles such that the lines PQ and P'Q' 
are parallel, QR and Q'R' are parallel, and PR and P R' are parallel. 
Then the three lines PP\ QQ\ and RR' are either parallel or 

_concurrent._ 

1.2(a) Let A { and A 2 be affine lines. Let x be an affine coordinate function onA,; 
let y be an affine coordinate function on A 2 . Let /:A!-»A 2 be an 
affine mapping. Associated with / is a function F: [R —> ER such that if 
Q— f(P), then y(Q) — F°x(P). Show that the most general formula for F is 
F( a) = m + s. 

(b) Let x' — ax + b, y' = cy + d, so that x' and y' are new affine coordinate 
functions on A t and A 2 respectively. If y(Q) = F°x(P) where F(oc) = m + s, 
find the formula f or the function F'(ff) such that y’(Q) = F'°x'(P ). 


1.3 A function u: R 2 -»1R is affine if it is an affine function on each line of the 
plane and if, for any parallelogram, u(P) + u(R) = u(Q) + u(S) where the 
vertices are labeled as in figure 1.27. Suppose that w: [R 2 —> R is affine and 


that u 

1 


3 

— 8 u 

2 

— q 


_2_ 


_3_ 


_-l_ 



S 



Q 


Figure 1.27 
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X 


(a) Find a formula for u 

(b) Sketch R 2 , showing t 

_k_ 
he 1 

ines u = constant. 

1.4 Find the image of the rectangle ABCDE shown in figure 1.28 under the 


linear transformation represented by each of the following matrices. In 
each case calculate the determinant of the transformation and verify that 
the area and orientation of the image of the rectangle are correctly 
pr e dict e d by this d e t e rminant. 









D 


V 

B 









E 


0 

A 















(a) 


(b) 


(c) 


Figure 1.28 


The rotation R n/2 = 


'0 -V 

.1 0 , 


The rotation R n/4 . — (1/V 2 )( j 


The ‘distortion’ D-, = 


2 0 





(d) The ‘Lorentz transformation’ L 2 = 


'5/4 3/4'' 


,3 /4 5 /4, 


(e) The shear transformation = 


1 1 

,0 1 . 


(f) The shear transformation S\ = 


1/2 1 / 2 '' 
-1/2 3/2, 


4 e\ 


(g) The reflection M 0 - 


,0 - 1 / 

'0 \\ 


(h) The reflection M n/4 = ^ 


(i) The projection P n/4 


' 1/2 1 / 2 '' 

4 /2 1 / 2 , 
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the product matrix 



(b) 5? 


( C ) ^-n/2^0 

(d) PU 

(e) A % 


1.6. Calculate the inverse of each of the following matrices, and interpret the 



the matrix 



1.9. Devise a procedure for writing any matrix 


a o 
c d 


with c ^ 0 as a triple 
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product 


1 y\(0 f\n 


X 


,0 1 


oAo l 


- 12 


26 


and apply this proc e dure to the matrix 


-4-8. 


1.10. Prove that Ar 


'0 

x 


b' 
d, 


Detl 


'0 b' 
X d y 


by 


(a) direct verification (find the image of the unit square), and by 

(b) using the decomposition in Exercise 1.9, which works even when a — 0. 

1.11. Construct a 2 x 2 matrix which represents each of the following trans¬ 
formations of the plane: 

(a) A transformation P, satisfying P 2 = P, which maps the entire plane 
onto the line y — 2x and which maps the line y = —2x into the origin. 

(b) A shear transformation S which carries every point on the line y = 2x 
into itself, which transforms the _y-axis into the line y = — x, and which 
satisfies the condition (S — I) 2 — 0. 

n\ ( T 

(c) A transformation which carries 


and which carries 




into 


\4; 




| jntn j 






L-Jj 



(d) A nilpotent transformation N, satisfying N 2 = 0, whose image and 
kernel are both the line y = 3x. 


1.12. For practic e in multiplyin g 3x3 matrices, consid er the two m a trices 

/0 1 0 \ - /0 0 0 > 


U=\ 0 0 11 L= I 1 0 0 

Vo 0 0 / Vo 1 Oy 


Calculat e UL, LU and U 2 


1.13. Define the determinant of a 3x3 matrix by 


a 11 a i2 a l3 
a 21 a 22 a 23 |~ a ll^ e f| 

a. 


*31 a 32 a 33j 


+ a 13 Det 


“22 “23 

, a 32 a 33. 


a 2l a 22 
v a 31 a 32/ 


i — a 12 Det 


^21 a 23 

\ a 31 a 33/ 


a ll a 22 a 33 ~~ a ll a 23 a 32 ~ a l2 a 2l a 33 3" a 12 a 23 a 31 
+ Ai3«21 a 32 — a 13 fl 22 a 31- 


Prove that 


Det ( F°G) = Det F x Det G. 

1.14. Show that, if the matrix 


/«u 

*21- 


12 


a 13\ 


22 


XT 


23 


V 


a 


a 33/ 


31 


32 


satisfies the conditions a, ] ^ 0 and a t ^a 22 —a 12 q 2 i ¥= 0, then we can write 




volume F(D ) 

Vol F =-— 

volume Z) 


for any region D and, in particular, 

Vol(F°G) = VolF x VolG 

and Vol F = volume F(D), where □ is the unit cube. Prove that 

Vol F = |DetF|. 

1.16. Consider an affine transformation of the plane which does not leave the 
origin fixed: 

CKM:) 

where A represents an affine transformation which leaves the origin fixed. 




-A* 


represented by the 
following: 


(b) When such a 3 x 3 matrix acts on y |, the third component of the 

W 

resulting vector is 1. 

/l 0 a\ 

(c) The matrices T(a,b)=\ 0 1 b 1 represent pure translations, and 

\0 0 1 / 

they obey the composition law 

T(a, b)T(c, d ) = T(a + c,b + d). 

From a geometric point of view we can give the following interpretation to 
Exercise 1.16: We are considering the affine plane as the plane z= 1 in K 3 . We 
have identified the srouc of affine motions as a group of linear transformations 
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r x / 


.. 1 . 1 .. 


- x - 

y 


Figu 

ire 1.29 


joining that point to the origin: anv point on our plane determines a unique line 


it at a unique point. We can thus identify ‘points’ in our affine plane with certain kinds 


of lines through the origin in R 3 : those that intersect the plane z = 1. The advantage 
to this interpretation is that it gives us a grip on the notion of (artistic) perspective: 
two plane figures in R 3 (not containing the origin) are ‘in perspective’ from the 
origin if they determine the same family of lines through the origin. This suggests 



plane. The new ‘points’ that we have added are those lines through the origin in 
R 3 which lie in the z = \0 plane, as these are the only lines through the origin which 


r?v Yn 


do not meet the plane z = 1. Let! b |#[ 0 

\q/ Voy 

1 be a point of R 3 in the z = 0 plane and let 


4A 

P denote the line through the origin and 

b so P is one of our new ‘points’. Thus 

L A / 


w 

P — j ( bt j > . From the point of view of R 3 , where P is a line, we can approximate 

1A 0/ ten) 




Jnear transformations of the plane 


P by the family of lines through the origin P e where 


p e H bt 


As £ —* 0 . P C -*P. But P„ intersects the z = 1 plane, when t= 1/e. at the point 



Figure 1 . 31 


"he points P E in the affine 


infinity’ of the affine plane. These new ‘points at infinity’ were first introduced in 
the theoretical study of perspective by artists and geometers of the fifteenth and 
sixteenth centuries. 

We have thus introduced a new space, called P 2 , the projective plane. A ‘point’ 
of IP 2 is iust a line through the origin in IR 3 . Let us now see how to define a ‘line’ 


iaiirftiimiiT4ai ini i ■ i ■>] ruiwii hi: 


pijihrj iTI m® 


Iwll IkMk'IBllI 


siiuiwwiirjiMiii 
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But this means that the family of lines through the origin sweep out a plane 




l»^ i H ■ | (•«•Jp ■ ■ ^pj ■ ;t\v« ■ ■ in ■ <*npB rvH«j | ■ ■ h«i ■ i^p |iri ■ 




U 3 which does not intersect the plane z= 1 and that is the plane z = 0. We have 
thus to add just one ‘line at infinity’. A ‘point’ P lies on the ‘line’ l if the line through 
the origin lies in the plane through the origin, /. Two distinct ‘points’, P and Q 
(that is, two distinct lines through the origin) determine a unique plane through 
the origin, i.e., tw o distinct ‘points’ determine a unique ‘line’. An y two distinct 
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eometry where two lines can be parallel. Two parallel lines in the affine plane 
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To summarize: 


A ‘point’ in P 2 is a line through the origin in 
A ‘line’ in P 2 is a plane through the origin in 
Any two distinct ‘points’ lie on a unique ‘line’: 




origin- 


origin. 


1.17. (a) Show that any invertible 3x3 matrix determines a one-to-one 
transformation of the projective plane, P 2 , which carries ‘lines’ into 
‘lines’. 

(b) Show that two invertible 3x3 matrices A and B determine the same 
transformation of P 2 if and only if A = cB for some non-zero real 
number, c. 


au + b\ + cw = 0 

can hold unless a, b and c are all zero. Show that if u, v and w are linearly 
independent, then there exists a unique 3x3 matrix A such that 

f l \ f°\ f°\ 

Au= 0 , Av =4 1 I and Aw =1 0 


and that A is invertible. (The general version of this theorem for any finite- 


Let /j, r 2 , r 3 De me points iii r given oy m e lin e s mruugn mo ungm 
and ^0^, ^ 1 ^ and ^(Irrespectively. Let Qi,Q 2 >63 be any three 
‘points’ of P 2 which do not lie on the same ‘line’. Show that there is an 
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invertible 3x3 matrix which carries Q x into P y , Q 2 into P 2 and Q 3 into 

(b) Let Q ly Q 2 , Q 3 , Q 4 be the four ‘points’ in P 2 , no three of which lie on the 
same line. Let R u R 2 , R 3 , R 4 be another set of four ‘points’, no three of 
which lie on a ‘line’. Show that there exists a 3 x 3 matrix which carries 
Q 1 to R u Q 2 to R 2 , Q 3 to R 3 and g 4 to R 4 . 

(c) Prove the ‘fundamental theorem of projective geometry’ which asserts 
that any one-to-one transformation of P 2 which carries ‘lines’ into 
‘lines’ comes from a 3 x 3 matrix. (Hint: Reduce to the fundamental 
theorem of affine geometry proved in the appendix to this chapter.) 

1.20. As an illustration of the use of 1.19(b), prove Fano’s theorem which says. 

Let A, B, C, D be four points, no three of which lie on a line 


P 

_ F i gure 1.33 


Le t P be the poin t o f intersectio n of AB and CD. 

Let Q be the point of intersection of AC and BD. 

Let R be the point of intersection of AD and BC. 

Then P, Q and R do not lie on a line. 

(Hint: Reduce t o a special case; for example, A, B, C, t he three vertices of an 
equilateral triangle and D its center.) 
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In Chapter 2 we discuss conformal linear geometry in the 
plane, that is, the geometry of lines and angles, and its relation 
to certain kinds of 2 x 2 matrices. We also discuss the notion 
of eigenvalues and eigenvectors, so important in quantum 
mechanics. We use these notions to give an algorithm for 
computing the powers of a matrix. As an application we 
study the basic properties of Markov chains. 




transformations, so is their composition g°f. 

Suppose that / preserves angle and orientation. We can find some rotation r_, 


such that takes ( Z into a point on the positive x-axis. Then r _ a oj 


on the positive y-axis since r_ 0 °/ preserves angles and Det(r_ 0 °/) > 0. Thus the 







matrix representing r_ 0 °/ is of the form 


with r > 0, 5 > 0. Since preserves angles, it must carry the line through 


into itself. To say 
r = s. Thus 


0 s i 


r , r 0 , 

r -»° /= U r J 
The matrix representing / is therefore of the form 

F ( r ^^\/ /cos ^ — sin0 N 


r 


'cos 0 — 

0 

r)\ 

v sin 0 

rcos 6 

— r sin 9 

r sind 

r cos 6 


where a = r cos 6, b = r sin 0. It is clear that any such matrix preserves angle and 
satisfies Det F = a 2 + b 2 = r 2 > 07 

_____._„ ._,_ La -b\ _ . 


Conversely, any non-zero matrix of the form 
orientation since, starting with 


len iind t> suer 

■\o — rt-y> ~ ^_cl 


preserves angle and 


since a ^ r and sn 


cos 2 fl = (a 2 + b 2 )/r 2 = 1. And therefore it follows the 


7 a 

— b \ (r cos U 

— r sintf \ 

U 

a) \ r sin 9 

r cos 6 ) 


Thus the most general matrix of the form 

(; ■ 

preserves angle and orientation, with 


, a 2 + b 2 =£ 0 , 


Det ( a b \ = a 2 + b 2 . 

\b a) 

The product of any two such matrices is clearly such a matrix, but notice in addition 
that 


a j\b' a' J V ba' + ab' aa — 




so that, in this case, multiplication is commutative. Furthermore, the inverse exists, 
-uflles &_a=J i = 0, since the d eterminant = a 2 + b 2 . Finally 


1 

( a — b\ 

i , fa’ -V') 

1 _ 1 

fa + a' —(b + b')\ 


I 

K b - a) 


M 




a — b 


so that t he sum of two matrices of the form 


a 


is again of this type. This is 


somewhat remarkable and not to have been expected from the definition. Let us call 

a — b 


a matrix of the form 


b a 


conformal. (We allow the possibility that a = b = 0. 


Thus the non-zero conformal matrices are the ones that preserve angles and 
orientation.) 

We have proved that the set of all conformal matrices is closed under addition and 
multiplication, that multiplication is commutative for such matrices, and each non¬ 
zero conformal matrix has an inverse. Thus conformal matrices behave very much 
like numbers. 

We can write any conformal matrix as 

fa 

0 1 


1 J 0 - 1 

a | „ . \ + b 


a 


1 


d 


Notice that 


'-Q- 

1 


-a:\ 


o/ 


isj-otation thr ough n inety degrees and thus 


'0 


0 - 


TT 


0 - 1 


(4 — 0 ^ 

OTT 


We w r i t e 


for 


/I 0\ 




and 


'0 -1 


i for 


so that 


where 


a 

b 


b' 

a 


1 0 


— GiU T bi 


l = - II. 


In other words, we can identify the set of conformal matrices with the set of complex 
numbers. 

The usual representation of a complex number as a point in the plane simply is the 
identification of the complex number with image of For conformal matrices 



fa) 

\ A pfprm i maffiY 1 

fa - b\ 
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\ b a) 
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It is very easy to compute the nth power of a conformal matrix. Indeed, if we write 

/ /i L \ f -t* /A \ / /I » y>\ 

A _ u u \_ 1 u — sin y v 

L_ 

\b a 7 \0 r)\sm0 cos 0) 

1 




then, since 


0 r J 


commutes with all 2 x 2 matrices. 


t— 0 V/ cos 0 — — sin 0 Y —— 0\ / co smO — sin nd 


A n = 


0 r / \ sin 0 cos 0 


.0 r n )\ sin nd cos nd }' 


Thus 


rcosd — rsin 0 \" fr n cos n6 — r n sin n0\ 
r sin# rcosd) \r n sin nd r n cos nO/' 


In the language of complex numbers, this says that if 

z = r( cos d + isin 9) 

then 

z n = r n (cos n6 + i sin nd) 

and is known as DeMoivre’s theorem. 

Another way of computing A n is to use the binomial formula: since 


fa 0 


4“ 


0 


0 - 1 


Mi— 


= a 




and 


0 


1 


ALT 


commu te: 


W° -• 


-0—4 


1 


0 


(\ 0 s 


= a 


X 


+ na 




(0 -1 


Vi 


0 


+ \n{n — l)a n 2 b z 


f0- -1 


v -o? 
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'0 


TA 


0 \ 


But 


<v 


o -i; 


so 


n 


or 


T O' 


a" = i o"-r: ia" _2 6 z + 


a" _4 b 4 + 


+ 


Ao i. 

“-‘6-1" la r -’" "° _|X 

i r \3 ■ ■ /vi o 


n 


\a 


-.n- 3l3 


b 3 + 


In the next section we will provide an efficient algorithm for computing powers of 
any 2 x 2 matrix, not necessarily conformal. It will involve the notion of eigenvalue, 
a concept that plays a key role in quantum mechanics. 


2.2. Eigenvectors and eigenvalues 

Let F be a linear transformation. We can ask whether F carries some line through 
the origin into itself. (No non-trivial rotation has this property, for example, while 
any non-zero singular transformation carrie s its ima g e into itself.) 






If v is a non-zero vector Iyin 

g on such a line, we mi 

ist have 


_T7/_ A_T___ 



r{\) = Xy 


w «ome real number X. If this en 


X is called an eigenvalue of £ 



X 2 — (tr F)X + Pet F = 0. 


The polynomial 


P(X) = X 2 -(a + d) X + (ad - be 


led the characteristic polynomial of 


a b 


^nd the equation 


is called the characteristic equation. It will have real roots 

X = $\_(a + d)±J{(a + d ) 2 - 4 (ad - be)}] 


= ma + d)±J{(a-d) 2 +4bc}l 


if and only if 


(a — d) 2 4- 4bc ^ 0. 


If this occurs, we know that 


a— X b \{ d — X 


c d — X )\ — c 


c d — 





so that if 


d — A 


— (a — A) 


are non-zero they are eigenvectors and they both 


lie on the same line, since Det 


d — A 


= — Det 


V-c -{a-X)J 

they are both zero, then a = A, b = c = 0 and d = A so that 


a — A 


d-A 


= 0. If 


b d 


A 0\ 

o Ay 


F = AI 


and every (non-zero) vector in the plane is an eigenvector. 



If (a — d) 2 + Abe > 0, so that there are two distinct real eigenvalues, A l and A 2 , then 
F # AI and so there are only two lines through the origin left fixed, each spanned by 



If we let B be the matrix 



combine the two equations for the eigenvectors to read 





so the first column of B is an eigenvector of F with eigenvalue and, similarly, the 
second column is an eigenvector with eigenvalue X 2 . 


Case 2. Repeated Real Root 


If (a — d) 2 4- 4 be = 0, so that P(X ) = 0 has a double root, X, the situation is a little 
more complicated. Consider the two matrices 



For both of these matrices the characteristic polynomial is P(X ) = X 2 so that X = 0 is 
a double root. Every non-zero vector in the plane is an eigenvector of the first matrix 

while only the vectors are eigenvectors for the second. Notice, ho wever, that 

both matrices satisfy the equation F 2 = 0, which we can write as P(F) = 0, i.e., we 
substitute F (as if it were a number) into its own characteristic polynomial and we get 
0. We claim that this is a general fact, called the Cayley-Hamilton theorem. 

Given any matrix F whose characteristic polynomial is P(X) then 


P(F) = 0 



For our case of 2 x 2 matrices this can be verified by direct calculation: 


( 

a b V 

„_ J } 


'a 2 + bc ab + bd\ 

i J — i H2, I 

> 

V 

x u J 



(ci -4- d) 

(a b" 

1 i 

)=( 

/ a 2 + ad ab + bd\ 



\ c ti 

1 \ 

v ca + ca ad + d'j 


and 

/1 o\ 

f n/j — hr* 0 \ 


(ad — be) 

so 

f a -l 

[o , 

V- (a 

pi 

+ d) 

Wtv Is L' \ 

^ 0 ad — be ) 

)( a ^ S ] + (ad — bc) 

P °)-a 

\c d 

7 \c aj 

\o 1 7 


If P(X) has a double root, 



there are two possibilities: 








In this second case, let | 


I oe an eigenvector oi r and let | 


| be some non-zero 


vector which is not an eigenvector of F. Th e n 


F-). 


(l 0M/x, \ 


vq i;j 


,y i) 


_ / /; p \\2 _ 

is an eigenvector of F since F - ( n , ) = 0, and so is some non-zero multiple of 


0 X 


x 


l \ By multiplying by a suitable non-zero constant, we can arrange that 


Again the matrix 


B 


\y i yij 

is non-singular, and we can write the above equation as 



Case 3. Complex Roots 

We still have to deal with the case of a transformation that has no real eigenvalues 
or eigenvectors. The most obvious example of such a transformation is a rotation 
through an angle that is not a multiple of n. Such a rotation clearly does not carry 
any non-zero vector into a multiple of itself. More generally, a conformal 
transformation, which may be viewed as a rotation of the plane followed by a 
uniform ‘stretching’, will have no real eigenvectors. 

Consider, now, what happens if we try to find eigenvalues and eigenvectors for a 








The characteristic equation is 



We previously observed, in section 2.1, that the conformal matrix ^ can 

be used to represent the complex number x + iy; now we see that this complex 
number is an eigenvalue of the matrix. Furthermore, given any pair of complex 
conjugate numbers, x + iy and x — iy, there is a real conformal matrix that has 
these numbers as its eigenvalues. Of course, we cannot interpret these complex 
eigenvalues geometrically, since the associated eigenvectors have complex com¬ 
ponents and cannot be regarded as vectors in the real plane. 

We will show that we can write a matrix F, whose eigenvalues are x ± iy, in the 
form 





Thus 


^0 _ y\ 


F = xl 4- B 




u °) 



/o 


- y\ 


= B(xI)B 



} 1 

B 1 since B(xl) = (xI)B 

Vy 07 


_r_ 

/n - 

\1 

1 _ 


= B\ xl + 


IB- 1 . 


F = B 


as was to be proved. 

/1 \ 

For example, we can make the convenient choice \ l = 11, though any other 
choice would have been equally suitable. Then, since G\ 1 — yv 2 , we have v 2 = 
y^Gvi = y _1 G^^. Thus v l5 the first column of B, is while v 2 , the second 

column, is the first column of G divided by y. 

7^2 ff\ 

To summarize: if F= -{——— has eigen values x + iy, with y # 0, then F = 

d J - — 

BCB~ where 


B Jl la-r)l y\ ^ 

- Kf) - utv — 1—^ - 


furthermore , G = F —xl satisfie s the 


2 — — \>2 i 


2 - 1 


ce TrF = 


6 + J136-52) 


= 3 + 2i. 


Thus x = 3, y = 2. The matrix 


G = F-31 = 


4 -10 

2-4 


satisfies G 2 = — 41, as expected. To construct the second column of B, we just 
divide the first column of G by y: = Hence B = f ^ and F = 


BCB -1 = 


1 2\/3 —2\/l -2 

0 1 / \ 2 3 JI 0 1 


large) values of n. (In the next section, we shall give an instance where this problem 
is of interest . ) 


Case 1. Real Distinct Roots 

If I 


0\ 


A = 


,0 A/ 

' X\ 0 


then clearly 


A" = 


v o K) 

So computing the powers of a diagonal matrix is reduced to computing the powers 
of real numbers. If 

F = BAB ~ 1 

then 

F 2 = BAB~ 1 BAB ~ 1 = BA 2 B ~ 1 

and (by induction) 

F n = BA n B~\ 


Case 2. Repeated Real Root 

Next let us examine the matrix 


A 1 


0 X- 


A 0\ J o 1 

-=F" 


0 x 


-0—0- 


Now 


'X O' 


0-i 


commutes with all 2 x 2 matrices and 


W T 


= 0 


so, we may apply the binomial formula: 


' X 0\ 70 l\Y 

0 x) + \0 07/ 


'X 0V (X O' 
0 X) + A0 X 


G ij- 

(as the remaining terms in the binomial formula vanish). Thus 

(X lY/A" 

V0 X) ~\0 X n J 

So if 

'X 1 


n- 1 


'0 1 ' 
0 0 


a) b_i 


then 


■-pf* 


F n = B 


V tr V 


Vo x n j 


B 





and where we can compute C" by either of the two methods given at the end of 
section 2.1. 

Thus, for each of the three possibilities listed above (distinct real eigenvalues, 
repeated eigenvalues, complex eigenvalues), we have a simple method for computing 
the powers of a matrix F, once we have computed the eigenvalues and the change 
of basis matrix B. 

Actually, for the last two cases, we do not have to compute B: for case 2, 

(.F — XI) = N satisfies N 2 = 0 
so 


F H = {XI + NT = X n I + nX n ~ l N 



In this section we give an application of matrix multiplication to probability. We 
do not want to write a whole introductory treatise on the theory of probability. 
We just summarize the most basic facts: Probability assignments assign real 
numbers 


0^p(A)^l, 0<p(BKl,... 

to ‘events’ A, B, etc., according to certain rules. These are 
The probability of an event that is certain is 1; 

The probability of an event that is impossible is 0; 

If the event A can occur in k mutually exclusive ways (we write this as 





then 

p(A) = p(A t ) + • • • + p(A k ). 

In particular, if A c denotes the ‘complementary event’, the event that A does not 
occur, then 


A u A c is certain (either A will occur or not) 


-and 


AnA c = 0 


so 


p(A) + p(A c )=l. 

One also has ‘conditional probabilities’: 

p(B\A) = the conditional probability of B given A. 


Thus, if A is the event ‘it is raining today’ and B is the event ‘it is clear tomorrow’, 
then p{B\A) is the probability that it will be clear tomorrow given that it is raining 
today. We then have the rule 


p{AnB) = p(B\A)p(A) 
i.e., 

the probability of A and B equals the product of the conditional probability 
of B giv e n A with the probability of A. 

In part icular, if A r . A k are mut ually exc l usi ve alternatives, A i nA j = 0 and B 

can occur only if one of the events A t occurs: 

B = B n A x u. . ,u Bn A k 

then 

p(B) = p(B n A !) + ••• + p {B n A k ) , 

so 

- P(B) = p(B\ A JpiAJ + • • • + p(B\A k )p(A k ). 

We shall now consider a system which can exist in one of two states; a switch 
might b e on or off, or, in a game of badminton, ‘stat e 1’ might d e note the situation 
where player number 1 is serving while ‘state 2’ is where player number 2 is serving. 
We envisage a situation in which in one ‘step’ there can be a ‘transition’ from one 
state to another. Thus, in our badminton example, at each ‘step’ in the process 
(at each point of the game), the system can stay in the same state (server makes 
the point and serves again) or make a transition from one state to the other (server 
loses the point and opponent gets to serve). For example, we can imagine that at 
some stage of the game if player 1 is serving, he has probability 0.8 of winning 
the point and probability 0.2 of losing, while if player 2 is to serve, then she has 
probability 0.7 of winning the point and probability 0.3 of losing. In a real game, 
the probability of a given player winning a point at some stage of the game 
will depend on a whole lot of factors (how encouraged or demoralized he is by 
the game up to that stage, how tired she is, etc.). We make the drastic assumption 





that none of these considerations matter, that all that matters is who are the 
opponents and who is serving. We can thus summa r ize the above probability 
assignments by the matrix 

/ 0.8 0.3\ 

\0.2 0.7/ - 

Thus 0 S represents the conditional probability of the system being in s tate 1 after 
the step if it is in state 1 before the step, while 0.2 represents the conditional 
probability of being in state 2 after the step if the system was in state 1 before the step. 

In general a (discrete time, two-state, stationary) Markov process is a process 
in which the states can change in discrete units of time, but where the probability 
of transition from one state to another depends only on the state the system is in, 
not on the past history of the system or on the time that the transition is taking 
place. Thus there are four ‘transition probabilities’ which can be arranged as a 
matrix 



where 


A = 


a = probability of transition from state 1 to state 1; 
b = probability of transition from state 2 to state 1; 


c = probability of trans ition from state 1 t o state 2; 


from state 2 to state 2. 


S upp ose that we do not know what state the syst em is in at a given tim e; all that 
q = i — p that it is in state 2. This probability assignment can be represented by 


the vector 




V = 


After one step, the law for conditional probability says that 


f probability of 'J f trans. prob. 1 fprob. of' 

< being in state 1 >=< from state 1 > x<| being in 
( after the step J (to state 1 J (state 1 

C trans. prob. 1 C prob. of 

+ ^ from state 2 V x ^ being in 
(to state 1 J (state 2 

= ap + bq 

and similarly the probability of being in state 2 after one step is 

cp + dq. 

In other words, the new ‘probability vector’ is - 



( ap + bq\ 


fa b \ 

( P\ Av 



— 

i c d 1 

[q - Ay - 


\ / 


J 

\y / 





Let us illustrate this in our badminton examples. Suppose we know that player 1 
\s to serve the first point. The vp^tor 


T 


v 0 = 


0 


then represents the initial probability vector at the beginning of the game. After 
the first point, the probability vector is 


Vl = A\ 0 = 

After the second point, it is 
v 2 = Ay i = 

After the third point, 


'0.8 0.3' 
0.2 0.7 


r 

o 


' 0 . 8 ' 

0.2 


0.8 0.3' 
0.2 0.7 


v 3 = Ay 2 = 


'0.8 0.3' 
0.2 0.7 


( 0 . 8 ' 

V 0 - 2 , 

' 0 . 1 \ 
0.3 j 


(0.1\ 

Ko-ij 

'0.65' 

0.35 


= A 2 y 0 . 


= d 3 V f 


and so on. In general, the effect of playing n points is represented by the matrix A n . 

On thinking about this situation, you may realize that the probability vector 
af ter a l arg e n umber of ste ps o ug ht to b e practica lly i nd epend ent o f the i ni tial 
state: whether player 1 is serving fo r th e f ifteenth point is unlikely to depend 
strongly on which player served for the first point. This suspicion is confirmed by 


f ind 



/ 0.7 0.55 V ( 0.63 0.56\ 


A* = 


\0.3 0.45 ) - V 0 - 37 °- 44 7 


A 8 _= 


( 0.63 0.56V/0.602 0.598 


\0.37 0.447 V 0 - 398 0.402 


A 16 = 


'0.602 

0.398 


0.598V 
0.402 ) 


'0.600006 
0.399 994 


0.599 994' 
0.400006 


lim A" 

QO 


exactly. 


and we might conjecture that 

' 0.6 0 . 6 ' 

0.4 0.4 

\ / 

In fact it is easy to show in general that, as long as b and c do not both equal 
0 or both equal 1, lim„_ >00 A” exists. We need only determine the eigenvalues and 
eigenvectors of A. Since a + c — l,b + d—1 ,wq may write 

1 — c b 


-4 = 1 

1 \ h \ 

El 


\ c 1 ~ b ) 


Since Tr4 =7 — (h + c) and Det A - 

= 1 — b — c hr. — hr. = 1 —(h + r). the charac- 

V ■ -- ’ V ■ 






teristic equation is 


>■- -[2-(b + c)]A + 1 —(b + c) = 0 


or 


{A-m-(l-b-c)) = Q. 


The eigenvalues are k 1 = 1, X 2 = 1 — (b + c). Note that |/ 2 | sg 1, with equality only 


if b — c = 0 or if b = c = 1. 

The eigenvectors are easily found by considering 

i-c b \_/l 0\ 

c 1-fc] VO 1 J = 


X-A 1 / = 


— c b 
c — b 


The kernel of this singular matrix consists of multiples of the eigenvector corres¬ 
ponding to = 1: we normalize this vector so that its components sum to 1 and 
find 


1 


Vi = 


b + c\c 


The image of A — X x I consists of multiples of the eigenvector corresponding to X 2 \ 
a convenient choice of this eigenvector is v 2 — ^ Y _ 








positivity conditions. It is clear that 


1 


A n = \ 

1 i ii it is even, 


A if n is odd. 


The meaning of th e matrix A is obvious. It repr e s e nts a su r e transition to the 
other state. There is no limit as «-»• oo. (Yet, in a certain average sense, we expect 
to find each state occupied about half the time.) 

It is a straightforward matter to represent Markov processes for systems with 
more than two states by larger matrices - a three-state process by a 3 x 3 matrix, 
and so on. The entries in each column are non-negative and sum to unity. A 
typical 3x3 stochastic matrix is 


/0.5 0 0.1\ 

A= 0.3 0.6 0 . 

\0.2 0.4 0.9/ 


The important features of the 2x2 case persist, with some differences. 


i ui ms lance 


/ 0.5 
0.5 
0 


0.3 

0.7 

0 


0 

0 

0 


o\ 

0 

1 


\-Q - 0 10 / 


represents a system in which it is impossible to get from the first two state s t o the 

t. A probability vector concentrated in the first two states 

in the last two states (i.e., with first two 
components zero) will move around and i ts v alue will depend on whet her n is 

occur in terms of the matrix entries of A. With the exception of such cases, the 


n- dimen sional case js the same as the two-di mensional one - the ma trix has an 
of 1, w i th an associated e ig envector describing a limiting state, the 


other eigenvalues are all less than one, and lim„^ x /T is a singular matrix which 
transforms any probability vector into the eigenvector corresponding to X = 1. 


Summary 

A Conformal matrices 

You should be able to identify a conformal matrix and describe in geometric terms 
the transformation that it represents. 

You should be able to state and apply the isomorphism between conformal 
matrices and complex numbers. 

B Eigenvalues and eigenvectors 

You should be able to form the characteristic equation of a 2 x 2 matrix and use it to 
determine the eigenvalues of the matrix. 

You should be able to determine eigenvectors corresponding to real eigenvalues 





of a 2 x 2 matrix and discribe the action of the matrix in terms of its eigenvectors and 
eigen-val ues . 

C Similarity of matrices 

Given a 2 x 2 matrix A, you should be able to construct a m a trix B so that A — 


BCB~ 1 , whe r e C is diagonal if A has distinct real eigenvalues, C is conformal if A 


. -;- X J\ 


has coniplex eigenvalues, <Anci is oi tne iorm l q * j 

1 li /i nas a repeated eigen- 


value, but A # XI. In each case you should be able to interpret the columns of B 
geometrically. 


j) Markov processes 

You should be able to write down the n x n matrix that represents a Markov process 
with n states. 

For a 2 x 2 matrix A that represents a Markov process, you should be able to 
relate the eigenvalues and eigenvectors of A to the behavior of the probabilities of 
the two states of the process. 



matrices. 

(b) Express F\ and F 2 each as the product of a multiple of the identity 
ma trix and a rotation. Using the identity e l( * = c os 0 + i sin 0, express Zj 

_an d z 2 in ‘polar form ’ z = re' e . 

(c ) Calculate Ff 1 . Calculate zf 1 , rationalizing the denominator. 

- Compare .- 

(d) Calculate F 1 F 2 and F 2 F t . Calculate z t z 2 and compare. 


2.2 Explicitly verify DeMoivre’s theorem for the conformal matrices F, and 
F 2 of exercise 2.1; that is calculate F\ and F\. 

' 0.8 - 0 . 6 ' 


2.3(a) Show that R = 


, 0.6 0.8 

through an angle of about 37°. Calculate R -1 . 


represents a counterclockwise rotation 


/1 2 \ 

(b) S — I I represents a shear along the + x-axis. Calculate S 1 and 
interpret it geometrically. 

(c) Calculate A = RSR~ i and interpret it geometrically. Do the same for A' 1 , 
for B = RS ~ 1 R~ i , and for B~ l . 

(— 1 18 ^ 

2.4 Apply the diagonalization procedure to F = 


-3 8. 


as follows: 


(a) Form the characteristic polynomial P(X) and set it equal.to zero to find 
the eigenvalues of F. (Answer. X = 2, X — — 1.) 




(b) Check that P{F) = U, as promised by the Cayley-Hamilton theorem. 

(c) Find an eigenvector for each eigenvalue. Let y ~ 1 in each eigenvector, 
m the matrices B and B" 1 . and confirm t 



it as Lt = BAB where B is a rotation and A is diagonal. Interpret the 


2.6 Find an invertible matrix B and a diagonal matrix D such that 

s (-i 'D jr, - ft 

/-1 9\ 

2.7 Diagonalize the matrix F = I ^ 1, which has a repeated eigenvalue, 

by the following procedure: 

(a) Form the characteristic polynomial P(X ) and find the eigenvalues. 


• I ■ /v 1 1 

(b) Find an eigenvector of F of the form I 1. 

(c) Form the matrix G = F — XI. Show that the Cayley-Hamilton 
theorem implies that G 2 — 0, and confirm this explicitly. Find the 
image and kernel of G. 


(d) Find a vector ) with the property that G 


Now form 


the matrices B and B 1 and 

. . _ 7 " (X 1 


F = B 


you have succeeded in writing 


IB- 1 . 


which has a repeated eigenvalue. Find the image and kernel of G = F — XL 


si uc a z, * z, nidiiiA wiLii cigciivcuuca A i ^ A 2 

(a) Describe a procedure for calculating the matrix G„ = X\ n A n easily by 
diagonalizing A. Show that the matrix F = lim„^ 00 G„ is singular. 

/ 3 — 2\ 

(b) Carry through this procedure for the matrix A = I ^ 1, calculat¬ 

ing G„ and F explicitly. Find the eigenvalues and eigenvectors of A, and 
find the image and kernel of the transformation F, and relate them to 
the eigenvectors of A. 

2.10 For any matrix A, the trace of the matrix, Tr A, is defined as the sum of the 

fa b\ _ 


entries on the principal diagonal. Thus, if A = ^ ^J, Tr A = a + d. 

(a) Prove that if A and B are two 2x2 matrices, Tr (,4B) = Tr (B.4) even if 
A and B do not commute. 

(b) Prove that Tr T equals the sum of the eigenvalues of A. Conclude that 
if A = SBS " \ then Tr A = Tr B. 

(c) Using the result of (a), prove that Tr {ABC) = Tr (BCA) = Tr (C4B). 




form F = BCB~ l 


(a) r ina tne eigenvalues of F. 

construct a contormal matrix C 

witn tne same eigenvalues as F. 

f 1 c \ 


(c) Construct B in the form 

. 


2.13 Let F = 


(4 — 5\ 

2.12 Let v4 = I ^ I. Find a conformal matrix C, and a matrix S that 
represents a shear transformation, such that A — SCS -1 . 

/ a b\ 

Let F = 1 be a matrix with real distinct eigenvalues and X 2 . Let 


Vc dj 

x=\{X l + X 2 ), y = i{X x -X 2 ). 

(a) Show that H = F — xl obeys the equation H 2 = y 2 I. 

f x y\ 

(b) Show that S = has the same eigenvalues as F. 

\y xj 


(c) Devise a procedure for constructing a matrix B, whose first column is 




|, such that H = B 1 

(° 'j 

[5 1 and F — B\ 

( X y ) 

\B~\ _ 


Ko) 




Ky _xj 

r" ■ 


(d) Find a matrix R such that 

= B 


A, 0 


rpr+7 



m 

, ; zj oi 


Prove that BR\ 

voj 

1 and BR\ ^ j 

| are eigenvectors of F. 


1.14 Let F be a 2 x 2 matrix with distinct real eigenvalues A x and X 2 . Define 

F - - F-A,/ 

p, =■; p,= 


(a) P x and P 2 are projections: P\ = P u P 2 2 = P 2 . 

(b) P 1 P 2 = P 2 P 1 = 0. 

(c) F = A t P i+X 2 P 2 . 

(d) F n = X n 1 P l +X n 2 P 2 . 

(e) Calculate P t and P 2 explicitly for the case F = 
the result to calculate F 7 . 


3 4 

-1 -2 


and use 


2.15 Let F be a 2 x 2 matrix whose characteristic equation has roots A = x ± i y. 
We can alternatively write x±iy = re ±10 , where r = J{x 2 +y 2 ) and 
e~ =cos0 + isin0. If F is a conformal matrix, it rotates the plane 
through angle 6 and stretches it uniformly by a factor of r. This problem 
explores the case where F is not necessarily conformal. 



igenvectors an eigenvalues 


lint: F = 


, where C is conformal. 


(b) Write F = | 


in the form BCB and thereby find the 


smallest integer n tor which t " is a multiple ot ttie identity. Check your 
answer by direct multiplication. 

(c) Show that your answer to (b) follows from the Cayley-Hamilton 
theorem. 

-2 —15\ . . 


(d) Find a ‘square root’ of G — 


, i.e., find a matrix A such 


that A 2 = G. Reminder: 

cos 2 ^0 = \{\ + cos 9), sin 2 \9 — — cos 9) 

2.16 Modernistic composer Allie A. Tory constructs his two-tone works by the 
following Markov process: 

1. If note N — 1 was an F, then the probability p N that note N is an F is f, 
while the probability q N that note N is a G is j. 

2. If note IV — 1 was a G, then the probability p N that note N is an F is 






[liT'iKTnrcnu 



corresponding to each eigenvalue, 


(d) Suppose that note 1 is an F, so that 



Show on a diagram the sequence of vectors 



Determine the limit of this sequence, and interpret it in terms of the 
eigenvectors of A. 

2.17 The quarterback of the Houston Eulers, who majored in probability 
theory in college, has devised a play-calling procedure with the following 
properties: 

1 _ If nlav N — 1 was a nass. then the nrobabilitv that Dlav A is a nass is 




(a) Construct the 2 x 2 matrix A which transforms the probabilities 


(Pn- i 

\<?JV- 1 

into the probabilities 



(b) Determine the eigenvalues of A and find an eigenvector of A 
corresponding to each eigenvalue. Illustrate on a diagram the 
action of A on each eigenvector. 

(c) No one knows how the quarterback decides what to do for play 1, 
but observation of game films shows that play 2 is a pass half the 
time, a run half the time. What are the probabilities 

C‘ 

\41 

for the first play? 


2.18 Professor Constantine Bayes has been teaching his course ‘Stochastic 
Methods in Classical Archaeology’ for decades. It is widel y known that 


Bayes selects examination questions 


ancient 


Greek urns which he keeps in his office, but the contents of the urns are 
secret. However, by analyzing the pattern of Bayes’ final examinations, 
which are on file in Lamont Library, students 


1. If the final examinat ion i n year N — 1 h ad a question on statu es, the 


final examination in year N will have a question on statues half the 


time, a q ue s tion on potte r y half the time. 


2. If the final examination in year N — 1 had a question on pottery, the 


final examination in year /V will have a question on statues { of the time, 


a question on pottery § of the time. 


(a) Write the matrix M which transforms the probabilities 


Pn- 




for a statue question or pottery question respectively, into the 


probabilities 


Pn 


for the next year’s final examination. 


(b) Find the eigenvectors and eigenvalues of M, and write M in the 
form SDS~ l , where D is diagonal. 

(c) By attending Bayes’ office hours regularly, a student has finally 
learned details of his method. Bayes has two urns, but he uses them 
once for the hour examination, then once again for the final 
examination, so that the matrix M represents two steps of a Markov 
process! By using your diagonalization of M, find the two possible 
matrices N for one step of the process. 


2.19 John and Eli are playing a game with a ball that can roll into one of two 
pockets labelled H and Y. John wants to keep the ball in H and Eli wants 
to keep it in Y. When it is John’s turn to play, if he finds the ball in H that is 
fine with him and he does nothing; but if he finds it in Y he attempts to roll 
it into pocket H. This takes some skill; the probability that he succeeds is 




there being a j chance that the ball will roll back into Y. When Eli’s turn 
comes, he does nothing if the ball is in Y, but tries to get it there if he finds it 
in H. Eli is less skillful than John and his probability of succeeding in his 
effort is only I- 


(b) Find a formula for the probability that the ball is in H after John’s nth 
play (i.e., after John has played n times and Eli (n — 1) times). 

(c) Suppose the game has been going on for a ‘long time’ and you look in 
just after Eli has played. What is the probability that the ball is now in 
//? How many turns constitutes a ‘long time’ if we want to be certain 
that this probability is correct within 0.001? 

2.20 A bank has instituted a policy to prevent the tellers’ lines from ever getting 
more than two persons long. If a third person arrives, all three customers 
are escorted into the manager’s office to receive high-level personal 
service, and the teller starts again with no line. Furthermore, an armed 
guard at the entrance to the bank assures that no more than one customer 
per minute can enter (it takes that long for a really thorough search). As a 
result, the length of a teller’s line is determined by the following Markov 



nothing happens. 

If the line has two customers, the probability is that one customer is 
served^ that a third customer arrives and all three are taken to the 
ma nager, le av ing no lin e, and \ that noth i ng happens.__ 





ime i mio me prooaDiuues ior u 
(b) At 9 am, when the bank opens, p t = 1. What are the probabilities 

( p A 

f p 2 ) at 9:03 am? 


(c) Find the eigenvalues and eigenvectors of M. 


(d) What is the limiting value of i p 2 j after the bank has been open for 


a long time? Estimate at what time the probabilities p u p 2 , and p 3 
will all be within 0.001 of these limiting values. 

(e) On the average, how many customers per minute are served? How 
many are taken to the manager’s office? 


I 




by the integer i. So there are n + 1 states: i = 0, i = 1, ■.., i = N. At each instant of 
time, one of the N balls is picked at random (i.e.. with probability 1 /N) and moved 


III ■ a f 1 


(U aUI ill ri 


according as me Dan 
these transitions are il 


was in the hi 


_ ox. ine proDaomties ol 

and (N — i)N ~ 1 respectively. Thus 


Pi-i,i = iN 1 
Pi+i,i =(N — i)AT _1 

Pj, i = j ^ i 1 or i + 1. 
For example, if N = 4, the 5x5 transition matrix is 

10 l 0 0 0 \ 


P = \ 0 


10 + 00 


lo 0 + 0 1 
\ 0 0 0 i 0 1 

Notice that P transforms any state with i even into a state with i odd and vice 
versa. Thus P 2 transforms even states into even states and odd states into odd 


Pi-2, i = i(i 


Pu = 
Pl + 2.l = 


= (N — i)(N — i — 1)AT 


Pn = 


4 u 8 ^ 
n 1 n 3 


)2 _ 3 


U 8 W 8 U 

\0 0 + 0 i 


Since transitions for P z are only between states of the same parity, we may as 
well consider the states i = 0,2,4 and i = 1,3,5,... separately. (In the above matrix, 
this means combining separately the matrices obtained by considering only the 
even-even positions and the odd-odd positions: 


i i 
4 8 


and R — 


2.21.(a) For the matrix Q show that 6 lis an eigenvector with eigenvalue 1 and 


1 




?) Do the same computations for TV = 5: Show that the ‘even’ eigenvector 
with eigenvalue 1 is proportional to 10 land the ‘odd’ one is propor- 

V 5 / 


(c) Prove in the general case that the ‘even’ and ‘odd’ eigenvectors with 
eigenvalue 1 are proportional to the vector whose entries are the even or 
odd binomial coefficients. In other words, 


even odd 

eigenvector eigenvector. 


iiit 


till tji) «i 11] tJifi I m 11 ■ I ■ KKl till L'J 


•i'ilfd'i!!*!!' Ill'll LM 


y interesting physical and 
mathematical problems ranging from plant growth to celestial mechanics 
( see, for exampl e, D’Arcy Thompson’s On Gro wth and Form). The 
recursio n whi ch g e ne rates th e sequence is 

x n + 2 ~ X n + 1 + X n *0 — ^ 

( a) Compute the ratio x„+ ,/x„ for n = 1 up to 8 or so. Do you thi nk this 
_ sequence has a limit? _ 


Use A to express 


in terms of 


(c) Find an explicit expression for x„ in terms of and x 0 . (Hint: 
Diagonalize A.) 

(d) Show that lim^^^x^ + j^ exists and compute its value. 

(e) What does (d) tell you about the infinite continued fraction 


h that lim„^ m x„ ^, /x„ di 
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The principal goal of Chapter 3 is to explain that a system 
of homogeneous linear differential equations with constant 
coefficients can be written as du/dt = An where A is a matrix 
and u is a vector, and that the solution can be written as 
e^'uo where u 0 gives the initial conditions. This of course 
requires us to explain what is meant by the exponential of 
a matrix. We also describe the qualitative behavior of 
solutions and the inhomogeneous case, including a discussion 
of r e sonanc e.- 


3.1. Functions of matrices 


(in our discussion of the Cayley-Hamilton th e orem) 


a ‘polynomial in a matrix’. More generally, let Q{X) be any polynomial. If 


Q(X) = a n X n + a n _ 1 X n ^ 1 + ■ ■ ■ + a y X + a 0 


then we define the matrix Q(F ) by 


Q(F) = a n F n h a„- 1 F l 


n 1 


a x F + a 0 . 


Now we can multiply two polynomials {QiQ,t)(X) = 61 PO 62 PO to obtain a third. 
Similarly 


QMF) = Q 1 (F)Q 2 (F). 

There is no problem with the fact that in general matrix multiplication is not 
commutative, since powers of a fixed matrix always commute with one another. 

pk pi _ pk +1 _ plpk 

on account of the associative law. Similarly, 

(Q 1 + Q2m = Qi(F) + Q 2 (F). 


is no trouble in evaluating a polynomial function at a fixed matrix. 


and the usual algebraic laws are satisfied. We would like, to consider some more 



general functions of matrices, and for this we need a slight digression about power 

series. _ 

An expression of the form 

R(X) = a 0 + a x X + a 2 X 2 + ■ • • + a n X n + ■■■ 

where the a h i = 0,1,..are real numbers and X is a symbol (as is X k for all k) 
is calle d a formal power series. We add two power series according to th e rule 

(a 0 + a 1 X + a 2 X 2 + •••) + (ho + hi + b 2 X 2 + ■ • •) 

= ( a 0 + b 0 ) + (a 1 + b-^X + ( a 2 + b 2 )X 2 + • • •, 

that is, we add the coefficients term by term. We multiply two power series by using 
the rule X k -X l = X k + l and collecting coefficients: 

(&q + Uj X + d 2 X 2 4-' ")(ho T b^X + b 2 X 2 + ■") 

= u 0 h 0 + (d^bo + ciob^X + (d 2 bo + a^b^ + ciob 2 )X 2 + ■ *' . 


Thus, for instance, 


(1 + X + X 2 + ■ • -)-(l + X + X 2 + • • -) = 1 + 2X + 3X 2 + • • •. 

It is easy to check that all the usual rules for addition and multiplication of 
polyn om ials h old equally well for formal pow er series._ 


power series exp(tA) by 


exp (tX) = 1 + tX + ±-t 2 X 2 ±tr t 3 X 3 + 


4- 


2 ! 


- 2 k 


4t 


i*x *T 


1M1 


Then 


/ 1 \ 

( 1 N 


exp (sX) exp (tX) = 

1 + sX + — s 2 X 2 + ••• 

l+tX + -t 2 X 2 + --- 



^ 2 f 

V 2! - 



= 1 + ( s + t) X + ^ ( s 2 + 2 s t + tl) X 2 + • • • 


where, on the right, the coefficient of X n is 

s" + ns n ~ h + n ^^-s n ~ 2 t 2 + • • • + t' 
which, by the binomial theorem, is just (l/«!)(s + t) n . Since 



1 + (s + t)X + 


^-(s + t) 2 X 2 +^-(s 


+ t) 3 X 3 + ■ • • = exp (s + t)X 


by definition, we conclude that 

exp (sJf) exp (tX) = exp (s + t)X (3.2) 

as an identity in formal power series. 

In contrast to polynomials we cannot, in general, ‘evaluate’ a formal power 
series, R(X) at a number, r, or at a matrix b. That is, if we try to substitute the 
re a l number r for the symbol X in 

R(X) — a n + a^X + a 2 X 2 + ■■• 



we get an ‘infinite sum’ of numbers 

a 0 + a x r + a 2 r 2 + 

which, a s i t stands, ma kes no sense. One way of trying to make s ense of such an 
infinit e sum is to chop off the e nd at some finit e value, so as to g e t a finite sum 
and to hope that place where we chop it off makes little difference - provided that 
we go out far enough. We would then assign to R(r) the value obta ined a s t he 
‘limiting value’ of the finite sum. Let us explain this procedure more precisely. For 
any integer M, define R M (r) to be the finite sum 

R M (r) = a Q + a 1 r + a 2 r 2 H-+ a M r M . 

We say that the power series R(X) converges at the number r if, for any positive 
number s, no matter how small, we can find some large enough M 0 so that for 
any integers M and N > M 0 we have 

|R M (r) ~~ R N ( r )\ < £ - 

In other words, if we go far enough out, all the values R M (r) lie in some interval 
of length e. Thus the further out we go, the closer the R M (r) cluster about some 
limiting value, and this limiting value is what we call R(r). 


We can no w make essentiall y the same defini tion for matrices. Fo r any matri x 


F the exoression 


R m (F) = a 0 + a x F + ••• + a M F^ 


maxes periectiy ^ooci sense, i ne ciniei etiee 

R m (F)-R n (F) 


is again a matrix, and we shall take the condition 


R (F) — R {F)\ < e 


tv-' lllvall LllCtl uaLll U1 Lilt/ lvJUi Cllllivo U1 111C illdLIlA x\ yi j iv y± j lido dUoUllUL 

value less than «. We say that R(X) converges at F, if for any e > 0 there is an M 0 

such that R M {F) — R N (F)\ < e for M and N > M n . When this happens each of the 


entries of R M (F ) clusters about some limiting value as we go out far enough. We 
thus get a matrix of limiting values and this limiting matrix is denoted by R{F). 
It is clear that if RfX), R 2 {X) and RfX) are formal power series such that 

R 1 (X)R 2 {X) = R 3 (X) 

and if all three of these series converge at F then 

R t (F)R 2 (F) = R 3 (F) 

since we can replace each RfX) by a finite approximation. Similarly for addition: 
if RiPO + R 2 (W) = R 3 PO and the series all converge at F, then RfF) + R 2 (F) = 
R 2 (F). 


3.2. The exponential of a matrix 

We have a formal power series (3.1) for the exponential function and we know 
the identity (3.2). We will now prove that the power series for exp(fX) converges 
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when we substitute a 2 x 2 matrix A for the symbol X. As a first step, we review 
that the series converges absolutely when we substitute any real number 
Ne consider the power series 






series is positive and thereby guarantee absolute convergence of exp Jk, once we 




‘remainders’ r m n formed by summing terms from the mth to the (m + rc)th term 

as small as we like when we choose a sufficiently lar 


y — — v m -\ - v m +1 4 - 

m ' n mV m!(m+ 1) 


m\(m + 1 )(m + 2) 


y m + 2 + 


+-y 

m!(m +l)...(m + n) 

with the geometric series 

11 1 11 
s m,n »/ ■ » y ' » 2 y ' ' t n y 

ml mlm mlm ml m 

Clearly, r m n ^s m n for all m. But we can sum explicitly: 


y-i 

(i , y 


2 

fxS 

3 




Sm ’ n m \ 

1 4 - N 

^- m — 

— 

m y 

M 

t m / 

+ + 


/ 




L / v 

" +1 \ 

S m,n 



□ 


wnere 


s m = 


- m, n ^ m - 

Suppose we choose m > 2y, so that 

1 < 1 =2 
y-y/m 1 -j 

Then s m < 2y m /m\, and, whenever we increase m by 1, we multiply s m by a factor 
which is less than Clearly, by choosing m large enough, we can make s m as 
small as we like, and since r m „ ^ s m , we can thus make r m „ as small as we like. It 
follows that the series 

exp tk = 1 4- tk + ^(tk) 2 4- ^-(tk) 3 "+- 


exponentia o 


)nverges absolutely. Incidentally, the well-known ‘ratio-test’, in which one proves 


>y shown 


fln + a 1 y + fl,V 2 + ^r + --- 


lim^ 


m^oo a m y 


relies on the argument just presented. 


a b 


Suppose now that A is a 2 x 2 matrix: A = ^ ^J, in which every entry is less 

in magnitude than /c/2. Each entry in A 2 = ^ is the sum of two terms, 

each of which is smaller in magnitude than (/c/2) 2 = k 2 / 4, and thus each entry in 
A 2 is smaller than /c 2 /2. By a similar argument, each entry in A 3 is less than k 3 / 2, 
and by induction we can prove that each entry in A m has absolute value less than 
/c m /2. Thus when we sum the series 

t 2 t 3 

-** /*—^ ^ —I i 1— A 1--1- A 3 1—. . .- 


each of the summands ofthe four e ntries in t he re sulting matrix is less th an the 
corresponding summand of the series - 


k t 2 k 2 t 3 k 


-MllUMl 


for ex 


series converges when A is an n x n matrix. 

It now follows that the fundamental identity for the exponential function 

exp (s + t)A — exp (sA) exp {tA) 

holds for matrices. 

You might ask, how about a more general identity of the form 

exp (A + B) = (exp T)(exp B ) 

where A and B are arbitrary matrices? To see what is involved, let us expand both 
sides 

exp {A + B) = I + A + B + ^(A + B) 2 + --- 

— I + A + B + i(A 2 + AB + BA + B 2 )+-- 


_ = I + A + B + (jA 2 + AB + ±B 2 ) +-• • • _ 

tere • • ■ denotes a sum of terms of degree higher than 2 in A and B. If we compare 




the quadratic terms, we see that they are not equal unless 


% AB + BA) = AB, 


i.e., unless 

AB = BA. 

Thus, if the matrices A and B do not commute, there is no reason to expect that 
exp (,4 + J B) = (expy4)(exp£) and, in fact, it will not, in general, be true. For a 
concrete example, take 


A = 


0 T 


0 0 


In this case A 2 = 0, and hence all the higher order terms in exp A vanish and we 
have the simple expression 


exp A = I + A = 


1 1' 


0 1 


Tak e 


and 


Then 


„ (0 0 \ 

B “l o 50 B =° 


exp B = I + B = 


IF 


1 1 


(exp /I i (exp B} — 


i n/i o\ T2 


o Vi i i 


On the other hand 


_/ r\ _ -i \ _ 

A + B — j 

! U Ii 



n^oi 



so 


{A + Bf 


1 O' 


0 1 


exp (A + B) — I + (A + B) + jl + — (^4 + B) + — / + 


Thus 





so 

exp (.4 + B) = j. f e + e ~* 6 £ _ ^W (exp A) (exp B). 

2\e-e 1 _ e + e y 

The reason that we do have 

exp (s + t)A = exp (sA) e xp (tA) 

is tha t th e matrices 5.4 and tA commute. 

Having shown that exp (tA) is well-defined by its power series, we next generalize 
the well-known formula 

d 

— [exp (t/c)] = k exp (tk). (3.4) 


We define the derivative of txp(tA) with respect to the real number t by 
d 1 

— [exp ( tA )] = lim - [exp ((t + h)A) — exp ( tA )]. 
dt k-*o» 

(The limit on the right-hand side of this equation means that each of the matrix 
entries tends to a limit.) Since exp((t + h)A) = exp (hA) exp (tA), we have 


— [exp (tA)~] = lim - [exp (hA) — /] exp (tA). 


i , l. "jr /_ i - i u ~ r v” / sr 

(it /i—»o h ~ 

^ffut 

.._ j _, h 2 A 2 h 3 A 3 

exp (KA) I — hA+ ~ + “ +••• 

M 31 

SO 

1 hA 2 h 2 A 3 

(exp (hA) I) — A + + +••■ 

n zl31 

allvT 

Km ^ n i 

lim y [exp (HA.) — i j — /L 


^Ve have thus proved that 

d 

~[exp(tA)] = Aexp(tA), 


which is just like (3.4) except that the multiplication on the right is now matrix 
multiplication. 

Now let v 0 = ( 0 ) be a fixed vector in the plane, and consider the time- 

\yo) 

dependent vector v(t) defined by 


y(t) = exp (t A)v 0 where A = 
We can define v(t), the time 


a b 
c d. 


1 


\(t) = lim - [v(t + h) - v(0; ■ 


h-*0 h 




V It = 


is v(r) = ' . Since v 0 is coi 


That is. 


so that 


v(£) = lim 7 Lexp ((£ 4- h)A) - exp 
h^O H 


v(£) = ^ exp (£^)v° 


\(t) = A exp(£yt)v 0 = A\(t). 

We have shown that v(t) satisfies the differential equation 


v(0 = Ay(t\ v(0) = v 0 


Writing v(£) = ( ^ J we see that 


'x(t)\ fa b\fx(t) s 
J(t)J \c d)\yjt) j 


x(t) = ax(t) + by(t fc 
y(t) = cx(t ) + dy(t)7 



( x(t)\ 

— exp (tA) 




KMj 





a solution to the system of linear ordinary differential equations written above 
In fact, it is easy to prove that anv solution to the differential equation v(£) ■ 


w(£) = — (exp (— £,4))v(£) + exp (— tA)\(t). 
at 


and by hypothesis 


exp (— tA) = — A exp (— tA) 


v(t) = A\(t). 


w(£) = — A exp (— tA)\(t) + exp (— tA)A\(t) = 0 

since the matrices A and exp(— tA) commute. It follows that w is a constant vector 
(cairft v 0 ) and we have 


o — 




\(t) = exp(M)v 0 . 


e thus see that the functi 


i tan »i fg ra ra * i 


ines the general sol 


MB 


various 


lputing exp i 


3.3. Computing the exponential of a matrix 

Suppose F and 6 are matrices which are related as 


Then 


F = BGB 


F k ~ BG k B~ 1 


for any k, and hence it follows from the power series expansion of exp ( tA ) that 

exp (tF) = B exp ( tG)B~ h 

Case 1 . F has distinct real eigenvalues. We can now make use of our 
abilit y to diagonalize 2x2 matri ces. Suppose tha t F has distinct real eige nvalu es 


f - bI* 1 . 
VO X-. 



and it follows fro m the power ser ies definition o f the exponential fu nction that 


0 X,t 


0 e Azf 


/e Al ' 0 \ 
exp (tF) = ^ f Q 


As a concrete example of this technique, take ^ = g 

characteristic polynomial of this matrix is X 2 — 21 — 3 = 0, so the eigenvalues are 

/ 4 4\ 

— 3, X 2 = — 1. Considering (F — 3/)=l 1, we find eigenvectors 


, the kernel of F — 31, and v- 


, the image of F — 3/. Thus 





an 



( 1 -1\ 

/3 0\ 

/? jA 


F = 

e-4 24 


Vi1 ) 



It follows immediately that 


'2 1 


and we can multiply out the matrices to obtain 

. / 2e 3r — e~ f 


exp (tF) = 


e 3t —e 1 


— 2e 3t + 2e ' — e 3f 4- 2e r 


Given any vector v 0 which specifies initial conditions, that is the value of v at t = 0, 
we can now write down the solution to \(t) = F\(t). Suppose, for example, that 

v° = (j). Then 

v (0 =exp W (;)=(_j;^_;). 


Differentiating each component, we find 

Ht) - ( — n 


>e 3< + 2e~ r 

|/i3l A a t 


and we confirm that 

- 

1-8 - 5 /t^3e 3< + 4e~‘ 


~9e 3 '~+2e~“ rN 
9e 3t — 4p~ f 


Case 2. R e peated eigenvalues. The method just d escribed w orks when F has 




0 A 


0 e A ' 


f = B (o 

r-p - (At t\ 

1 o exponentiate I I we write 

(o L) = XtI + (0 o) 

and make use of the fact that, if matrices C and D commute, then 

exp(C + D) = (exp C)(exp D ). 

Since AI commutes with any matrix we have 

™p(t‘ 'Wxp(Al/)exp(° I “ 





i /00\ —— “ 

But 

uLoJ 

1 

|, so we have from the power series ror the exponential 


/ 

'o r 

Ah 

(1 O'' 

i (0 t' 

A__i_•_i_ 

expl 

n n 

) t = 

l n i 

l+ Q Q 

1 + terms which are all zero, 

/ ~ 



\ U U , 


/0 i' 

\ (l 

t \ 

i_ t 

ie -’ exp Q 0 

) Vo 

1 

|, and so 

\ u u . 

f \y 
exp( 

1 / 

v 0 XtJ \ 

V' 0 \ 

v 0 e Xt ) 

/1 t \ (t Xl te A, \ 

\o iy = v 0 t Xt ) 


It follows that 


if F = B[ 


then exp (tF) = B\ 


X 

1\ , 


, IF' 1 

0 

x ) 

e At 

te Xt \ 


, J5 

0 

e Ar / 


As an example of this case, consider the system of equations 


x = x + .y. 




x + 3y. 


The matrix F = 


1 1 


-1—3- 


has characteristic equation X 2 — 42 + 4 = 07 


a double root X = 2. Cons i dering (F — 21) — ( j — } we find the eigenvector 


1 


0 


Vj = i i and the vector v 2 = I j 1 for which ( F — 2 /)v 2 — v t . Thus 


1 0 \/ 2 1 


1 0 




Since expf^ 

Vo A 


_/e 2t te 2t 
~ V 0 e 2t 

exp(tF) = ( 


i iyv° VV-i i. 

we have 


T 0\/e 2f fe 2t ' 


1 1 


0 


at 


1 O' 
-1 1 


or 


eX P (tf ) = ( (1 _ t e' (1+V)- 

If, for example, we wish to solve the above differential equations for initial conditions 
*o = 2, y 0 = 1, we just form 



( 2\ (2 e 2t -te 2t \ 


exp(tF) 

t 1 J ~ 1 (Ol _ tr- 2t i 

• 


W \ e — £e ) 









,inear differential equations in the plane 


x — 4e 2f —e 2 '— 2 te 2t = x + y, 
y = 2e 2 ' - e 2< - 2 te 21 = — x + 3yX 


til it 10 11 vy L. 1WU11J UVVVOJU1 J TT A ». LW A ^ J ill 111V » 

F has a repeated eigenvalue A. By the Cayley-Hamilton theorem, 

(F - XI) 2 = 0 

so the matrix G = F — XI is nilpotent. Now, since XI and G commute, 

exp ( tF ) = exp (At/) exp (tG). 

But exp(tG) is easily computed from the power series: 

exp(tG) = I + tG 
since (tG) 2 and all subsequent terms are zero. So 

, / Q Xt 0 V 


exp(tF) = 


(/ + tG). 


i r 


exp(tT) = 


r~G^ r^n= 
QMYi o\ 



Case 3. Complex eigenvalues. We have finally to deal with the case where F has 

complex eigenvalues. We have proved that, if the eigenvalues of F are a ± i/3, we 

_ /a — B\ 

can write F — BCB 1 where C is the conformal matrix C= . The 

\P a/ 

problem is now to exponentiate C. 

Notice that C = aI + /3J where J = qJ- The matr i x J satisfies 

J 2 = — I and corresponds to the complex number i = x /( — 1). Because a/ and (tt 
commute, we have 

exp(tC) = exp(ta/)exp(t/?J). 


Of course 



/ P «_q\ 


exp(ta/) = 


. 


) 







ing t le expo 


To compute expjjjU), we use 


series 


Since j~ = — 


4! 

/ 5 = J, etc., we have 


exp(^J) = / + t/?J-Jy(r/?) 2 /-^-(t)8) 3 J + ^-(t£) 4 / + ••• 

The coefficient of I is the power series for cos /ft; the coefficient of J is the power 
series for sin /ft, so we conclude that 

. 0 T , 0 T -or { COS P* ~ S»1/ft \ 
exp (tpJ) = cos /ft / + sin /ft./ = I . , 

\sin /ft cos /ft/ 

which is a time-dependent rotation matrix. Identifying J with the complex number 
i, we see also that 

e i/J ' = cos Bt + i sin Bt. 




and, if F = 


, we Can calculate exp(tF) as 


is an example, consider F = 


le characteristic equation is 


l 2 + 2X + 5 = 0, with roots X = — 1 + 2i. So F = BCB~ \ where C is the conformal 
matrix ^ As described in section 2.2, we can choose as 

the first column of B, and the second column of B is then the first column of 
F-a/divided by that is 1 "j = ^Y So B = ^ ^andF = (_ Y 


2V! -2 

i Ao i 


o 1 


We now calculate exp(tC) by the procedure just 


described: 


nT' 

i 

o 

if cos 2 1 

— sin2t \ 

V 0 e~ [ y 

' V sin 2t 

cos 2 1 7 


expftF) 

_ 

n 2) 

exp ft Cl 

ri -2) 




nr-H 


U) ri 








Multiplying the matrices, we have 


— oe ‘sin: 


e sin2t 


e '(cos 2f — 2 sin 2t) 


~ exp {tF) = t exp ( tF ). 
at- 

Again in this case, it is not really necessary to do the decomposition F = BCB ~ 1 
explicitly. Suppose that F has eigenvalues a + i(3 so that its characteristic equation is 

( 2 - a ) 2 + £ 2 = 0 . 

Then, by the Cayley-Hamilton theorem, 

(F - a I) 2 + p 2 1 = 0 

so the traceless matrix G = F — aI satisfies G 2 = — ft 2 1. Writing F = al + G, we 
see that 

exp(tT) = exp (at/) exp (tG). 

Again we exponentiate tG by using the power series: 

t 2 G 2 t 3 G 3 f 4 G 4 

exp (tG) = l+ tG + -^- + ~i~- + —— + — . 


fact that G 2 = — /? 2 /, we obtair 

. I (ft 2 P 2 fG 

= I + tG — l -f-I - 


The coefficient of / is again the power series for cos fit. The coeffic i ent of G is 


We conclude that 


exp (tG) = cos fitl + [(sin fit )//?] G 


/e at 0 \ 

exp (tF) = i Q gat Jexp(tG). 


Returning to the example F = Q for which a = 

form G = F + I = a traceless matrix satisfying G 2 = 


— 1, (3 = 2, we 

- 41. Then 


exp(tG) = 


cos 2 1 0 

0 cos 2 1 


+ ysin2t 


■ ... 

( cos2t + 2sin 2t -5 sin 2 1 \ 

exp(tG) = 

i oin 'tt COS 2 1 — 2 sjn 2 1) 
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v (0) = v 0 , has the unique solution 

v(£) = exp(M)v 0 . 

This solution v(£) defines a function from the time axis to a two-dimensional vector 
space. Because exp(L4) is defined for negative t as well as positive t, the domain 
of v(t) is — oo < t < oo. By plotting the point whose position vector is v for all 
values of t, we obtain a solution curve for the differential equation. This curve is 
like the path of a particle which moves in a plane, and the vector v(0 = A\(t), 
which is like the velocity vector for that particle, is tangent to the path. Through 
each point in the plane there passes a unique solution curve, and the effect of the 
transformation exp (tA) is represented by moving t units along the solution curve. 

y ' 








t = -i 
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J\ t =0 


Vp X \_ 




\ t= 1 




a ” 


jO)V 


^ = 2 

By plotting a whole fami 
which conveys the importan 
Although there are many dif 
equation v = A\, there are on 
To be specific, if matrices 
solution curves for v = A\ 
transformation v(£) = Bw(t). 

Figure 3.1 

ly of solution curves, we can create a phase portrait 
t features of the solutions of the differential equation, 
’erent matrices A which could appear in the differential 
ly a limited number of different types of phase portraits. 

1 and F are conjugate, so that A = BFB~ l , then the 
are obtained from those for w = F w by the linear 
The proof is simple: since B is constant, \(t) = B\v(t), 

una it ioJiows tnaiTit w = t w, tnen 

• n • nr , _ SFB~ ^ 7 'lv 







Thus the phase portraits for v = Ay are essentially the same as those for w = iw 
if A and F are conjugate. We can therefore determine pnssih]f> phas** portraits fu p 
to a linear transformation) by considering the different possibilities for the eigen ¬ 
values of A. - 


We note first that if v 0 is an eigenvector of A, with eigenvalue A, then 


l+tA+^-(tA) 2 + 


y(t) = exp(t/4)v 0 = 


2 ! 

1 + tX + —(tA) 2 + 


v 0 = e A 'v 0 - 


So in this case the solution curve is the straight line through the origin on which 
v 0 lies. If A is positive, v(t) moves away from the origin as t becomes large and 
positive; if X is negative, v(t) moves in toward the origin as t-> go. If A = 0 then 
v(t) = v 0 for all t, so each point on the line through v 0 stays fixed. 

We can now enumerate all possible cases. 


Case 1. A is a multiple of the identity matrix. 

'° ^ Z 1 o’ 

0 Q ,e x pM>- Q 


Case la. A — 


So all points stay fixed. 


Case lb. A = 


' X 0 


Ar~± 


, exp (t A)=_ 


xC 


it 





(a) 


(b) 


Figure 3.2(a) X > 0. 


Figure 3.2(b) X < 0 


Every vector is an eigenvector, so all solution curves are straight lines through the 
origin. If X > 0, each point moves exponentially away from the origin as t-+ oo; if 
X < 0 each point moves towards the origin. 

Case 2. A has real distinct eigenvalues X 1 and X 2 - 
Then 


h 0\ 
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Case 2b. ^ and X 2 both negative, |A~, | < /U|. This is similar to Case 2a, but a 

11 ~~1. ' ... 
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axis are again both solution curves. As t increases, x becomes larger, y smaller. 
Other solution curves approach the x-axis as t -»oo, the y-axis as t -> — go, as 


i » 

/ ill \ 

/ it \ 

/ j i \ 

/ in \ 

/ /!\ \ 
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re 3.4(a) ^4 = 


, q\ 

/v\ 


/ e Alf x 0 \ 
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More gener ally, s uppose eigenve ctor v t corresp onds to eigenvalue /, >0, while 

out, those along line through v 2 move in. Solution curves approach the line through 
Vt as t-» + oo, the line throug h v? as t -> — oo, as illustrated in fig ure 3.4( b). 
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Figure 3.4(b) 4 = B ^ „° j B ~ 1 


Case 2d. positive, A 2 zero. 
In t 
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(a) 


Figure 3.5(a) A = 


'X x 0 
. 0 



t-> — a o. More generally, suppose eigenvector v x corresponds to X x >0, while v 2 
corresponds to X 2 = 0. The line through the origin along v 2 is held fixed. Lines 
parallel to \ 1 are solution curves. Points move away from the line through v 2 as 
oo, toward this line as — oo. 



0 


Figu re 1 A — R 

^ /i — u 


0 0 


B 


Case 2e. X x is negative, X 2 zero. Just reverse all arrows in the preceding case. 
As + oo, entire plane is projected onto the line of eigenvector v 2 . 

Case 3. Repeated eigenvalue A, but A ^ XI. 

A = B (o = [)b~K 

In this case there is only one eigenvector v x . 

Case 3a. X > 0. 


In the special case where B = I, the eigenvector lies along the x-axis. 
x - axis is a solution curve; points on it move out. As f -» oo, x becomes much 




















3.5. Applications of differential equations 

The best-known physical system which gives rise to a differential equation of the 
sort we have just been considering consists of a mass M which moves under the 


X 



z 










X = u 

k z 

it = -x- u 

M M 

we can write this differential equation as 

w = Aw 

where 

/ 0 1 \ 2 k z 

A = [ 2 r )» = tt, r = —. 

\ — (Oq — r J M M 

The character of the motion is determined by the eigenvalues of A. Since the 
characteristic equation is _ 

A 2 + fA + a) 2 o =0, 

_ we have _ 

OZ 2 _ /Lu 


JL= A 


1. If T = 0 (no fri ct i on), X = + ico n , and the o scillator is undam ped. The phase 
)rtrai t cor res ponds to case 4 a; the solutio n curve s are ellipses i n the ,xn-plane . 
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conservation of energy. The period of the motion is 2nl(o { 













ns implifc 


then x and u remain proportional throughout the subsequent motion, 
f T 2 > 4con the characteristic equation has real neaati 







we can write 
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Figure 3.14 

oscillation 

Figure 3.15 


the form 


v — A\ = b(t). 
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F(t) 
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1 

with b not identically zero. T 
oscillator, a mass M acted upo 
force F(t). Then x = u and m 

C)-(- 

Figure 3.16 

o see how the term b(f) might arise, consider a driven 
m by a spring, a dashpot, and a motor which supplies a 
u = — kx — Zu + F{t ) so that 

0 1 Y X W 0 ) 

■k/M -Z/MJ\u) \F(t)/Mj 

1 o solve the equation v — As — b(t) we generalize the method called variation 


^ , -— - - "--I ~ ---- 








a single variable. Recall that for one dependent variable, to solve 


x — kx = 

f 1C o Cl 


' : al 1 


x — kx = b{t), we obtain 


ii(t) = e k ‘fr(t). 

Then, integrating once with respect to time, we find 


and finally 


u(t) = I e ks b{s)ds 


x(t) = e kt u(t ) = e kt e ks b(s)ds 

Jo 


x(t) = e k(t S) b(s) d s, 

Jo 

a solution which satisfies the initial condition x(0)=0. To obtain the general 
solution, satisfying the lhitiarc ohdition x(U) = x 0 , we simply add on the appr opriate 
solution to the h om ogeneou s equation, e kt .x 0 , so that the solu tion to x — kx — b{t) 
with x(0) = x 0 is 

__ 

x (t) = e k(t ~ s) h(s)ds + e kt x 0 . 

Jo 


exp(M.)v 0 is a so lution ol the homogeneous equation v — / 
constant vector v Q hy a function w(t), and try the so 
Substituting into v — ,4v = b(t), we obtain 

A exp (t^)w(t) + exp (tA)\v(t) — A exp (tA)w{t) = b(t) 

so that 

w(t) = exp( — tA)b(t). 

Integrating, we have 


and finally 



w(t)= exp(-s^)b(s)ds 
Jo 


v(t) = exp(tA)w(t), 



•*( 

V(0 = 

exp[(t — s)/4]b(s)ds. * 

% 

k) 




obtaining 

pi- 

\(t) = exp [(£ — s)A]b(s)ds + exp(L4)v 0 

Jo 


which satisfies v(0) = v 0 . 

This general solution is not the one which is usually found in discussions of the 
forced oscillator in physics textbooks. There it is usually assumed that the driving 
term b(£) is sinusoidal with fixed frequency co; for example, 


m= 


sm cot + cos cot 
3 cos cot 


•• . 

Then b(t) satisfies b = — co 2 b, and we can use integration by parts to evaluate the 
integral in the more general solution which we obtained above. The trick is the 
same one used to evaluate antiderivatives like fe~ ks sin s ds: integrate by parts twice 


Integrating once by parts, and assuming that A is non-singular, we have 




integrating again by parts, we have 
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+ exp [(£ — s)A']A 2 b(s)ds. 

J o 

Replacing b(s) by — co 2 b(s), we see that the last term is just — A~ 2 co 2 \. Thus 

v + y4“ 2 co 2 v = — [exp [(£ — s)^4][^4 _1 b(s) + ,4 -2 6(s)]]o 
or 

(A 2 + co 2 I)v = — [exp [(£ — s)^.] [b(s) + /lb(s)] ] f 0 . 

Unless the eigenvalues of A are ± i co, in which case (A 2 + co 2 I) would be singular, 
we can multiply both sides by (A 2 +co 2 I)~ 1 to obtain the explicit solution 

v = — (A 2 + co 2 iy l \Mt) + AW) - expM)(b(0) + Ab(0))]. 

The term involving exp (tA) serves on l y to guarantee that v(0) = 0; if T > 0, then 
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To check this result, notice that 


A\= - (A 2 + co 2 1) 1 f Ab(t) + A 2 b(t) 


This check shows that the result is correct even if A is a singular 
Suppose, for example, we wish to solve 


matrix! 


1 —2\/x 

2 -l/lv 


sin3t 
cos 3t 


Here 


b(0 = 


sin 3t 
cos 3t 


and B(t) = - 3 2 b(t), so co = 3. Then 
A 2 + w 2 I = (~l 


1 — 2\ / — 1 —2\ (9 0 

2 -l/l 2 -l) + [o 9 


(A 2 4 - CJ 2 I)~ l = ■ 


6 4 
4 6 


26 V 2 


3 cos 3 A f — sin 3t — 2 cos 3t A 
— 3sin3t/ t 2 sin3 t —co s3t ) 


_ 1/3 — 2\ / cos 3t —sin3f\ 1 /s in3t —5cos3t\ 

26\2 —3 y \ — c os3t — sin3fy 26^5sin3t + cos3t/ - 

This is the steady-state solution to the original differential equation. Notice that 


ponents of v and its phase (the location of the crests and troughs) have been changed. 

Let us examine what the steady-state behavior is for the case of the physical 
system described at the beginning of the section, with sinusoidal forcing term, so 


Then 


0 1 
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real 
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T 2 + co 2 - a>l 


I 2 4- m 2 I\ — fra 2 — cnh 2 -I- Y 2 (ai 2 — + T 2 co£ = to 2 — coni 2 + T 2 co 2 . 






Summary 


A. The exponential of a matrix 

You should be able to express the exponential function e tx as a formal power 
series in the m a trix X and to evaluate this function in cases where the powers of 
X have a particularly simple fo r m. 

You should be able to explain the meaning of the derivative (d/d£)e‘* and to 
show that it equals Xe tx . 

B. Linear differential equations 

You should be able to show that every solution to v = Tv is of the form v = e^'v,). 

You should be able to calculate e At for any 2x2 matrix A and thereby to solve 
v = Tv for given initial conditions. 

By determining the eigenvalues of a 2 x 2 matrix T, you should be able to 
identify or sketch a phase portrait that represents solution curves for v = Tv. 

C. Inhomogeneous equations and the harmonic oscillator 

You should be able to convert the second-order differential equation that describes 
a harmonic oscillator to the form v — Tv = b, where T is a 2 x 2 matrix and b a 
time-dependent vector. 

You should be able to solve the above equation and relate the solution to 
properties of the behavior of an oscillator such as damping and resonance^ 


_ Exercises _ 

3.1.(a) Write the power series expansions for (1 — Y)~ 1 and for (1 — X)~ 2 . 

(b) Multiply these two series and compare the general term with the series for 

(i-*r 3 - 


a i N 

4 4 


3.2.(a) Le t F = 


Prove that F 2 = \F and that F" = F/2" . Using this 


i_ 


result, evaluate the series expansion of (I — F)~ l . Compute the inverse 
directly, and compare. 

i 


(b) Try to evaluate 


'3 1 ' 

2 2 

1 1 
^2 2 . 


by writing it as (/ 4- P) 1 where P is the 


'i i' 
2 2 


projection ~J and using the series expansion of (1 +X) l . Notice 

that although the inverse exists, the series fails to converge. 

a\ 


3.3.(a) The matrix N n/4 . = y 3 has the property that iV^ /4 = 0. Taking 

advantage of this property, evaluate the matrix F(t) = exp {tN nj4 ) and 
check explicitly that F'{t) = N n/4 .F{t). 




exp (tP) = I + g{t)P. 

Find an expression for the function g{t). 

3.5. Suppose that B is a 2 x 2 matrix which has a repeated eigenvalue X. 

(a) Show that the matrix N = B — XI is nilpotent (i.e., N 2 = 0). 

(b) By writing B = N + XI and using the series for the exponential 
function, show that 

exp (tB) = (/ + tN) exp (tXI). 

3.6. Use exercise 3.5 to solve the system of equations 

*(t) = x{t) - y(t) 
y{t) = x(t) + 3 y(t) 

for arbitrary initial conditions { °Y 



(-4 5) 


^a) 44 = | 

1-2 aJ 



"(fa) 4 = 1 


-i r 


Let A be a 2 x 2 matrix which has two distinct real eigenvalues X t and X 2 , 
with associated eigenvectors 14 and v 2 . 

(a) Sh ow that the matrix P 1 =(A — X 2 I)/(X 1 - X 2 ) is a projection onto the 
line determined by the eigenvector v^Pf = P 1? the image of P j is the 
set of Avj and the kernel of Pi is the set of Av 2 . 

(b) Similarly P 2 = (A — X l I)/(X 2 — X t ) is a projection onto the line deter¬ 
mined by v 2 . Show that P 1 P 2 = P 2 P l =0, that P i 4 - P 2 = /, and that 
AjP j + X 2 P 2 — A. 

(c) By using the power series for the exponential, show that 

exp (tX 1 P 1 + tX 2 P 2 ) = e Xlt P t + e A2t P 2 . 

(d) Use this result to solve the equations 

x(t) = - 4x(t) + 5y(t), 

J>(t) = - 2 x(t) + 3 y(t) 

for arbitrary initial conditions 


Let A be a 2 x 2 matrix whose trace is 0 and whose determinant is 1. 
(a) Write down the characteristic equation of A, and state what this 
implies about A 2 . 


(b) Using the power series expansion of the exponential function, develop 

an expression for exp(L4) of the form 


exp (tA) = r\t)l =r Lr{t)/i 

where F(t) and G{t) involve trigonometric functions of t. 

(c) The solution curve for the equation v = Tv, with initial condition 

v = v 0 , is an ellipse as shown in fisure 3.19. Prove that all chords 

joining exp(M)v„ to exof — f/Uvr. are parallel to and that the 

midpoint of each such chord li( 
which v 0 lies. 

1 exp(-td)v 0 \ Vq 

is on the diameter of the ellipse on 

diameter 

V \ 


\ \ 

Mi— exp(L4)v 0 \ 

\ ,7m \ 


7 \ 

\ 

A 

y 

\ 

\ 

\ 

\ 

n 

\ 

I 

\ 

J 


/ 







Figur 

3.10. Suppose that G is a matrix whose t 
is — (5 2 . 

(a) According to the Cayley-Hamil 

(b) Using the power series for the ex 
+ exp (— G) is a multiple of the id 
that 

* exp (G) + exp 

(c) By multiplying the above identit 
Hamilton theorem, show that 

_~ f _ 4.u~ * _ 

e 3.19 

race is zero and whose determinant 

ton theorem, what does G 2 equal? 
ponential function, show that exp G 
[entity matrix. Find a function / such 

( -G)=mi . 

y by exp G and applying the Cayley- 
Det(exp(G)) = 1, and find an ex- 

prcssron 1U1 U1C LldtC Ul CAp VJ. 

(d) Let F = aI + G. Using the above results, show that Det (exp F) = e ( 

3.11. For each of the following differential equations, determine which of the 

phase portraits given in cases 1 through 4c best represents the nature of the 







(c) x — 4x — 5y, 
y = 4x — 4 y. 

(d) x = 2x + y, 
y— — x + 4y. 

(e) x = x — 5y, 
y = 2x — 5 y. 

(f) x = —2x + 4y, 
y= - x + 2y. 


3.12. For each of the following differential equations, determine which of the 
phase portraits given in cases 1 through 4c best represents the general 
solution, then solve the equation completely for initial conditions 
/x n \ /- 2\ 

at t = 0. 


W V 

(a) x = 3y, 

y = x — 2y. 

(b) x = — x + y. 

y—— 5xjT 3 y, _ 

(c) x — 3y 4- v 

-^' _r?- 

y — ^ I 

y _ C V i Am 

x — — jx 4 y, 

y— — 8x 4- ly. 

N 

1 

X 

rt 

1 

II 

X 

y — 5x + 2y 

(f) x = x + 2 y. 

V = 2x — 4y. 


3.13. By generalizing what you know about calculating and using the 
exponential of a 2 x 2 matrix to the 3x3 case, solve the differential 
equations 


x = y, 

y = z, 

z = — 6x — 1 ly — 6z 


for initial conditions 




at t = 0. 


(Note: The one tricky new step is inverting a 3 x 3 matrix. If you regard 
this as the problem of solving three sets of simultaneous linear equations, 
you can do it by brut e force.) 

3.14. By generalizing the techniques which you already know. Solve the 
equations 

- x = x + y — z, - 





y = — x + 5 y + z, 
z = - 2x + 2y + 4z, 


= 1 • 


3.15. By introducing the variable v = x, convert the second-order differential 
equation 

x + 4x -f- 5x = 0 

to a pair of first-order equations, then solve these equations for arbitrary 
( x o\ 


initial conditions 1. 

Vo/ 

3.16.(a) The differential equation for a critically damped harmonic oscillator, 
expressed in units chosen so that col = 1, is 

x -f 2x -I- x = 0. 

Solve this equation by matrix methods, introducing v = x as a new 
variable. Write down the solution for initial conditions ( ) = ( 0 | and 

W Vo/ 


Show that x = 0 or v = 0 can occur at most once. - 

(b) On e way of solving the above equation without having to contend with i 


corresponds to using a slightly weaker 


matrix with complex eigenvalues, then let e-*U. Do this, again showing 

.- fx 0 \ f 0\ 

what happens to the solutions for initial conditions ^ ^ J and ^ J, and 

to the phase portraits, as s -> 0. 

3.17. Consider the function cos tx. 

(a) Show, by use of formal power series, that 

d 2 . 


and that 


(cos tx) = — X 2 cos tx 


—(costx) = 0 for t = 0. 
d t 


(b) Suppose that [ „ ) = — B\ ), where B is a matrix which has a square 


/ y(0) ^ 


root A. Show that cos tA 

| is a solution to the second-order 

\y(0)J 






system of equations 

d 2 

_ ^v(t)=~A 2 v(t) 


with initial conditions r(0) = v n and dt>/df(0) = 0, 


_/_5_3\_ 

(e) T .et B = 

f 2 2 

. Find a matrix A, with positive eigenvalues, such 


\^2 !/ 


that A 2 = B. (Hint: diagonalize B .) 

(d) For the matrix A which you have just constructed, compute the matrix 
cos (tA). (Hint: You have already diagonalized A. Use procedure 
similar to that for computing exp (tA).) 

(e) Use the above results to solve the equations 

x= -jx+jy, 

y = 2 x - iy 

for initial conditions x(0) = y(0) = 0, x(0) = *<» y(0) = y o - 
3.18. Consider the system of differential equations 

x = 4/foc — y 
y = 9x + Py 


where /J is a real-valued parameter. 

(a) Solve the system for arbitrary initial conditions and /f = 0. 


(b) Find two critical values of the parameter, /i, < 0 and ji 2 > 0, at which 
the nature of the solutio n changes. D iscuss the solutions for x and 

__ 



(a) Find matrices D and B so that A = BUB l . 


(b) Construct the solution to the differential equation \ = A\ for arbitrary 


jc 0 \ 


initial conditions v 0 = 


when t — 0. Please remember that e° = 1. 


(c) Sketch a phase portrait for the equation v = Ay. Determine the image 
and kernel of the matrix 


F — lim exp (At), 

t-> oo 

and explain their significance in relation to the phase portrait. 

(d) By using the trial solution v = exp(Af)w, construct a solution to the 

differential equation v — A\ = 

3.20.(a) By introducing u = xasa new variable, convert 

x ■+- 2x — 3x = 3 sin It + 2 cos It 



to an equation of the form 


© 


-A 


( x\ 


W 
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Chapter 4 is devoted the study of scalar products and 
quadratic forms. It is rich in physical applications, including a 
discussion of normal modes and a detailed treatment of 
special relativity. 


4 i . The Euclide a n sc a l a r product 

Inan affine, plane, as.y_au._willmcaU,_ we Jiave only a very restricted notion of 
length: we can compare lengths of segments of parallel lines, but not lengths of 
segments along lines which are not parallel. For example, it is meaningful to say 
that the length of QR (or Q'R ') is twice the length of PQ in figure 4.1, but we 




Figure 4.1 


A Euclidean plane is an affine plane endowed with a distance function which 
assigns to every pair of points a non-negative real number, D(P,Q), called the 
distance between them. This distance function is compatible with the limited notion 
of length in affine geometry; e.g., D(Q\ R') = 2D(P, Q) in figure 4.1, but it also 
permits us to compare lengths of nonparallel segments such as PQ and PQ. In 
the Euclidean plane IR 2 , the distance function is defined by the well-known form ula 




°( p > Q) = J[(x n - xp) 2 + {y Q - y P Yl 

\ T?itrliflpnn trflncfArrv*^*-’- s' rm? rr-r,^ • rr. 

le 

r . ^ 

t\—£j uLsiiuetiri LicuiMuimanon 1 \ [HI —> is an aiiii 

i • i*,_ - r 

transformation which preserves 

tms distance iunction: i.e., D(/(P), f(Q)) = D{P,Q). 

Turning our attention to the Euclidean vector space of displacements in the 

Fliclidean olanc. W6 ‘nPP fnnptintl nrnvirlpo a \iroxr rvf 


icugui vawi v^iui. me lengtn is simply tne distance irom neaa to tail . We 

denote the length of a vecto 

r v by || v ||- Clearly, i 

f v = (y)’ then ^ v N = + y2 ^ 

y 

In general, the linear tra 

.11 .1 r r T~' l 

* 

Figure 4.2 

nsformations of the vector space IR 2 do not preserve 

_1 • _,_P_1 * 1 1_ 1 .1 


called orthogonal transformations: they are all either rotations about the origin 

or reflections in lines throu; 

gh the 

origin. 
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\ ZM 
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\ x/ 

_ 

P R /X 


/ 


_ /(OX 


Since a Euclidean transformatic 
triangle into a congruent triangle i 
particular, the notion of ‘perpendi 
(though not in affine geometry). We 
orthogonal if the triangle which the 

. INI 2 - 

w 

Figure 4.3 

>n of the plane preserves length, it carries every 
and hence preserves angles as well as lengths. In 
cular’ makes good sense in Euclidean geometry 
say that two vectors v and w are perpendicular or 
y define satisfies the Pythagorean theorem: i.e., if 

f || W || 2 = || V — w || 2 . 

V—W 


x^ 

V 

T7* _^ i A 

r igure 4.4 
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In terms of length and angle, we can now define the. Euclidean scalar product 



f Y.\ 


f x'\ 

of two vectors. If v = 

u 

| and v' = 

, are two vectors, their scalar product, 


\rj 

\y/ 


(v, v') is defined as 


(V,v') ~ 11VII II vj cos 0 



Figure 4.5 



the two 


following: we take the projection of v' onto the line through v; we then multiply 
the l ength of this p rojected vect or with the length of v with a p lus or min us sign 
according as to whether the projected vector points in the same or in the opposite 
direction as v. Since the scalar product is defined entirely in terms of the Euclidean 
geometry of the plane any linear transformation which preserves length must also 
preserve the scalar product: any such linear transformation preserves lengths of 
vectors and the angle between them, hence preserves their scalar product. In other 
words, if M is any orthogonal transformation then 


(Mv, MV) = (v, v') 

for any pair of vectors v and V. We shall give a more algebraic proof of this fact 
(cf. eq. (4.1)) below. Conversely, since (v,v) = ||v|| 2 , any M which satisfies the 
above equation for all v and V is certainly orthogonal. Suppose we hold v fixed 
and consider (v, v') as a function of v'. We claim that (v, v') is a linear function of 
v'; i.e. that 



We can see this most simply as follows. Suppose that we first consider the special 

_ fc\ _^^ 

u\ __ 

case where v = ^ lies on the x-axis. Then (v, v) = cx lor v' = 

, . This 

W 

U 7 / 



expression clearly depends linearly on v'. so we have verified the above assertion 
for t his special case. 




>n v' so we are done. To repeat the argument in more detail 


= (Mv, aMY 4 - bM w') since M is linear 

= a(M\, MY) + b(M\, MW) because we have verified this in the 

special case that Mv lies on the x-axis 
= a(\, v') + b(y, W) since M is orthogonal. 

Since the scalar product (v, v') is symmetrical in v and v', we see that (v, v') is also 
linear as a function of v when we hold v' fixed. These two facts allow us to write 

down the formula for the scalar product: write v = = xQ^ + and 

v' = ^o) + ,};/ (l)' N ° W sca * ar product of with 

vanishes since the vectors are orthogonal, and each of these basis vectors has 
length one. So, 


(v,v') = 


T\ 71 


— XX 



n7o 


om 


+ yx' 


L\ /0 


l ’ll 


i no 


+T/ 


QWf^ 

i-M-h 


using linearity in v' 


e nave rnus iouna a convenient iormuia ior 




OIIWO 


plane: 


(v, v') = xx' 4- yy'. 


We can summarize the important properties of the Euclidean scalar product as 
follows: 

(1) Symmetry: (v, v') = (v', v). 

(2) Bilinearity: (v, aY 4 - bW) = a(\, v') 4 - b (\, w'). 

(3) Positive definiteness: (v, v) ^ 0, and (v, v) = 0 only if v = 0. 


Using these properties, it is easy to express the scalar product in terms of length. 


lust consider 





Because the scalar product is linear in each factor, 

- w | l 2 =(v»v) — (v, w) - (w, v) + (w, W). 

But since (v, v) = II v 11 2 , (w, w) = II w | | 2 and (w, v) = (y. w), we have 


2(vw T^11vt 2 + l|w|| 2 -||v-wF 

and so 



--- 



(v, w) = || V II 2 + || w || 2 || V — w II 2 }. (4.1) 



This formula makes it clear that the Euclidean scalar product follows immediately 
from the Euclidean notion of length. If you write (v, w) = [[ v j| JJ w \\ cos 9 and look 
at figure 4.6 you will see that (4.1) is nothing more than the ‘law of cosines’ in 
disguise. 



4.2. 



process 


Le t V b e an abstract two-dimensiona l v ector space and suppose that we are g iv en 
a positive-d e finit e scalar product, ( , ) v on V. That is, suppose we are given a 
function which assigns to each pair of vectors v 1 ,v 2 in V a real number (v l5 s 2 )v 
and which satisfies the conditions of symmetry, bilinearity and positive definiteness. 

— W e claim that th e r e e xists a lin e ar isomorphism* L: V- > IR 2 such that 

(vi,v 2 ) k = (Lv 1 ,Lv 2 ). 


In other words, by the correct choice of a basis on V, we can arrange that the 
scalar product ( , ) v on V looks just like the Euclidean scalar product ( , ) on IR 2 . 
To prove this, choose some non-zero vector w in V. Since || w ||£ = (w,w) K > 0, the 
vector 


w 



has unit length, i.e., 


e i II 2 = (e l5 ejy = ' 


w 


(w, w) K = 1. 


* We recall that the word isomorphism me a ns that L i s linear, is one to on e, s urject i ve ( a nd 
therefore has a linear inverse), see §1 . 12 . 



]sjnw let u be any vector in V which is linearly independent of We know that 
such a u exists since V is two - dimensional . Let 

u, =u-(u,e 1 ) K ei 


We obse r ve that u 2 is 


-i> 


that is, 


(u 2 , etV = 0- 


Indeed, 


(u 2 , ei) K = (u - (u, ejy 

= ( U 5 e l)v — ( U > e i)v( e l> e iV 

= (u,e 1 ) K -(u,e 1 ) K = 0 


since (e 1 ,e 1 ) K = l. Also u 2 ^0, for otherwise e L and u would be linearly 
dependent. Now set 

1 


Then(e 2 ,e 1 ) K = (1/|| u 2 || v )(u 2 ,e 1 ) v = 0, and II e 2 IIk = 1- We will use e l9 e 2 as our basis 
of V. The most general vector in V can be written as 


v = + ye 2 . 


Notice that 


x l v > e i/ 

since (e-,e,) = 0 and (e lf = 1. Similarly 


y = (v,e 2 ). 

r« i 

Suppose that 


i c i ■ y i ^2 

and 

v 2 = -*2®1 T ^2^2 


so that the map L: V-> 1R by our basis satisfies 


lvi= C;) and lv 2 = G)- 

Then 

( v i> v 2 ) K = {x l e l + y t e 2i x 2 e x + y 2 e 2 ) K 

= x 1 x 2 +y 1 y 2 as (e^e^ =(e 2> e 1 ) K =0 
and (el, — (e 2 , e 2 )y = 1 

= (Lv 1? Lv 2 ). 

This is what we wanted to prove. 

On [R 3 we can define the Euclidean scalar product by 





r\ 


Again, it is clear that if v = 


then 


W 


II II 7. (_ _\ 

II V II - (v, v) 

represents the souare of the 

Euc 

:lidean length of the vector v The argument siven 



Hlv t vV tVJA T • JL liv U1 ft ^ ft 1 “ vil 

above^hows that we can re< 

cove 

;r the scalar product from the length by the same 

formula, (4.1): 

(v,w) 

i 

~ 2 

(II v || 2 -E || w || 2 — || v — w || 2 ). 


So any rotation of three-dimensional space preserves the scalar product. In parti¬ 
cular, if we are given two vectors, v and w, we can rotate the plane that they 
span into the z = 0 plane. For vectors in that plane, the scalar product reduces to 
the scalar product for R 2 . For such vectors we know that 

(v, w) = || v || || w || cos 6 


and hence (since both sides are invariant under rotation) it is true for all pairs of 
vectors. 

A vector space V is called three-dimensional if every four vectors are linearly 
dependent but there are three vectors which are linearly independent. Thus given 


any f our vectors we c an find fo ur numbers a iJL a 2 ,a 3 ,a 4 . not all zero 

such that_ 

fliVi + a 2 \ 2 + 0 3 v 3 + «4 V 4 = 0 

-but-there - exist three vectors u, v, w such that_ 


au + by + cw = 0 


ra = b = c = 0. Suppose that Khas a positive-definite scalar product 
( , ) K . W e can no w repeat the ar gument given ab ove for the two- dimen s ional case. 
Pi ck some non -zero vector. By mu lti plying by a scalar, we can ar range that i t has 
unit length. Call it Choose some vector u so that e x and u are linearly 


independent. Set 


and 


Then e x and e 2 satisfy 


u 2 = u-(u,e 1 ) K e 1 
1 


u,. 


u 


2 IIK 


II e t \\ v — ||e 2 \\ v = 1 and (e 1 ,e 2 ) F = 0. 

The set of all vectors of the form xe x + ye 2 is isomorphic to R 2 and hence is a 
two-dimensional vector space. Thus it can not be all of V. (We can not find three 
linearly independent vectors in this set.) Thus there must be some vector w in V 
which is not of the form xe* + ye 2 . Thus 

_w 3 = w - ( w, ej yet - (w, e 2 ) y e 2 _ 





is not zero. Set 


e 1 \\ v — 


= llcJ| y =l 


( e l5 e 2V — ( e l5 e 3)K — ( e 2> e 3)v — 0. 

If v is any vector in V, we claim that 

v - (v, e 1 ) v e 1 - (v, e 2 ) F e 2 - (v, e 3 ) K e 3 = 0 . 

Indeed, by the same argument as before, we set 

U = v “ (▼» e 1 )e 1 - (v, e 2 )e 2 - (v, e 3 )e 3 

then 

( v 4 » = (V 4 , e 2 ) K — (v 4 , e 3 ) K = 0 . 

But this means that if v 4 ^ 0 the vectors e l5 e 2 , e 3 , v 4 would be linearly independent: 
indeed, taking the scalar product of 

a 1 e 1 + a 2 e 2 + a 3 e 3 4- a A \ A — 0_ 

with e x ,e 2 and e 3 shows that a x = 0, a 2 = 0, a 3 = 0 . Thus, if v 4 ^ 0, a 4 — 0. This 
contradict s the a s sump tion that V is thre e-dimensional. 

Thus every vector in V can be written as 


v = xe t + ye 2 + ze 3 where_ x = (v^e 1 ),. y_= (v,e 2 ), z = (v,e 3 ). 

lust as in the two-dime nsi o nal case, we can defin e the map 

L: V-> I R 3 


This map is a linear, one-to-one, map of V onto IR 3 and 

(u,vV = (Lu, Lv) r 3. 

It is clear that we can prove the same sort of result in four, five,..., n dimensions. 
On IR" define the Euclidean scalar product 


= x 1 w 1 + --- + x„w n . 


A vector space V is called ^-dimensional if there exist n linearly independent vectors 
but every collection of n + 1 vectors is linearly dependent. We shall study the 
general theory of n-dimensional vector spaces in Chapter 10. If V is n-dimensional 
and has a nositive-definite scalar product, then we can find an orthonormal basis 





so 


V( v i> v i) = 2 


and 


/ ? \ 




m 


Next we subtract the component of v 2 along e t : 

3\ //3\ / t\\ 


w, =1 


i\ 
1 
2 
1 

3/ \\ 3/ \-i// \4/ 

/ 3\ 


w, = 


3 

5 

\ 3 / 


-(f + i+t-f) 



i i\ 


l l \ 

!) 

1 i' 

2 


f i 

1 

2 


,3 


\-j/ 


V 


Finally we convert w 2 to a unit vector: 


1 

IV 

I 



1 


_ 

( W 2> W 2) = ( 

3 

9 ' 

, ^ 1 + 3 2 + 5 2 = 36 

V 

1 c 

, i 

zl 


1 vu 

so 


jv 

i 

W-7 1 

U 


_ 2 _ 


. 

6 6 

-_i 



5 / 


/ 

7 p 

/ 

iV\ 



/ 1 

/ 

1 YT 


Ac a nntp t licit 

1 

1 

i 

\ — Q 


1 

5 I 

3 


\ 

U 

\ 

5 // 



Now we can easily write any vector v in 1R 4 as the sum of a vector n\ which is a 


linear combination of e x and e 2 (and hence of and v 2 ) and a vector which is 
perpendicular to both e x and e 2 . Consider, for example, 

li\ 


\-u 

\ 7/ 

' \ / 

Define n\ by 

7iv = (v,e 1 )e 1 +(v,e 2 )e 2 


i 

7 4\ 

, / i\\ 

1 0 

/ 

7 4 ^ 

/ 1 W / i\ 


i 

o 


i 

i 

1 


0 


Tf 

rT 1 


7TV = - 

1 




+77 

1 





4 

— 1 

_-J 


_ JL 

! j 

301 

— i 


rtl 



\ 

A 7 / 

\-v/ 

\- a / \ 

17/ 


J7i 









t=^ 


+ 36'36 


m 


7TV = 7l - 


+ 




w \ v w 


Then you can check that 



Uft 




V — 7IV = 


0 

2 

\ 6 / 


0 

-3 

\ 1/ 


is orthogonal to e x and e 2 , either by verifying that it is orthogonal to the original 
basis vectors v x and v 2 or to the orthonormal vectors e l and e 2 . We say that 
the transformation n sending v into n\ is orthogonal projection onto the subspace 
W spanned by \ 1 and v 2 . 

As a second example of the Gram-Schmidt process, consider the (four¬ 
dimensional) space of polynomials of degree ^ 3, with scalar product 


(f,g) = 


f(t)g(t)dt. 


J-i 


(Chec k that this defines a scalar p roduct!) W e start with the ordered basis 


- Vi = 1 , v 2 = U — v 3 - 1 2 , —v 4 = f 3 . 

Tf we started with different basis elements, or even the same elements in a different 


order, we would end 


orthonormal basis. We first calculate 




( v i> v i) = 


dl = 2 


and c onvert v 1 to a unit vector: 


h =Vi/V(vi>vJ=Vy / 2. 


We next calculate 


(ei,v 2 ) = 


(t/y/2)dt = 0 


and conclude that v 2 is already orthogonal to Since 


(v 2 ,v 2 ) = 


t 2 dt = 2/3 


- i 


we have e 2 = t/^J (2/3) = 7(3/2) t. 

Next we calculate w 3 : 

w 3 = v 3 - e l (e 1 , v 3 ) - e 2 (e 2 , v 3 ) 


2 1 

r -- 


t 2 dt--t 




t 3 dt 


2 |_ 





Since 



_ 

(w-., W.) = 

-f 

1 

CL 

II 

> 

VO? 3/ 

1 V“ 

1 - 1 


tne iiuiu iioiiiid-iizcu oasis vector is 

t 2 -j 



e 3 , ( » , 

: \/( 4 8 ?){t 2 — 3 ) — \As)i^ 

-!)• 

V ( 45 ) 

Finally, we calculate w 4 : 

w 4 = v 4 -e 1 (e 1 ,v 4 )- 

-i 

= r 3 -^ t 3 dt- 

J ~i 

= t 2 — 0 — fr-f — 0 

e 2 (e 2 ,v 4 )-e 3 (e 3 ,v 4 ) 

It 1 t 4 di-f(3i 2 - 1) 

*J 1 * 

i = r 3 -|t. 

'1 

(3t 5 - t 3 )dt 
-1 


Dividing by ^y(w 4 ,w 4 ) we obtain finally 




= Vl(5t 3 -3t). 


V( W <> W 4) 

Clearly, proceeding in this manner, we could construct a sequence of orthogonal 
polynomials of higher and higher degree. These polynomials, known as the 
Legendre polynomi als, will ap pear naturally in th e solution of probl ems in 


electrostatics using spherical polar coordinates. Indeed, it is usually true in physical 
applica t ions t hat vecto r spaces of functions, which f r equen t ly a r ise as solutions to 
diffe rentia l equation s7 have ort hogonal baseTswh ich arise naturally from physical 
considerations. For this reason it is rarely necessary in practice to carry out the 

process. 


4.3. Quadratic forms and symmetric matrices 

In sections 4.1 and 4.2 we have studied the Euclidean scalar product which satisfied 
three conditions: it was bilinear, symmetric, and positive-definite. We now want to 
investigate more general ‘scalar products’, which are not necessarily positive- 
definite. They play a central role in the theory of relativity. 

We return to IR 2 . Suppose that we are given a scalar product, < , ) on IR 2 , which 
is not necessarily positive-definite. Thus we assume that < , ) is 

bilinear: <v, au + hw> = a<v, u) + b(\, w) 
and 

symmetric: <u,v> = <v,u> 

for all vectors u, v, w and all real numbers a and b. We wish to compare < , ) with 
the Euclidean scalar product ( , ). We begin with the following elementary lemma. 


Let l: IR 2 -+ R be a linear map. Then there is a unique vector w such that 





Indeed, / is given by a 1 x 2 matrix ( a b ), i.e. 

'5 


_ / \ 

l — ax + by for any v = 1 

f x ) E U 2 . 

\y) 1 

v y 


Then take 

( a\ 


W = L 



so 


f (x\ (a\\ 

(v, w) = l I 1,1 b j \ — ax y-by 


as desired, and it is clear that w is the unique vector in U 2 with this property. Now 
consider < u, v > as a function of v for fixed u. This is a linear function of v, hence there 
is a vector w such that 


<u, v) = (v, w) for all vg V 

The vector w depends on u, so we should write w(u) in the above equation. To repeat, 
w(u) is that vector whose Euclidean scalar product with any v equals <u,v>. Let u t 
and u 2 be two vectors, and wfu^ and w(u 2 ) their corresponding ws. Now 

<au! + bu 2 , v) = <v, AUi + by symmetry 

-— u<v, i^) + b(\, u 2 ) — by bilinearity - 

= a<ui, v) + fr<u 2 , v) by symmetry 
= a(v,w(u 1 )) + b(v,w(u 2 )) 

= (v, ay/(»i) + b(w(u 2 ))). 

Thus w(au 1 + bu 2 ) = awfuj ) + 5 w(ii 2 ). In other words, w depends linearly on u. Thus 
we can write w(u) = Au, where A is a linear transformation. Going back t o the 
definition of w = Au, we see that - 






<u,v>=(v,Tu) * 



for all u and v in IR 2 . So far we have only used the fact that <u, v) is bilinear, i.e., linear 
in u when v is fixed and linear in v when u is fixed. (This is how we used the symmetry 
of < , ).) Now let us use the fact that < , ) is symmetric. Since 


this implies that 

and, since (u, v) = (v, u), that 


<u,v> = <v,u> 
(v, Au) = (u, Ay) 


(v,Au) = (Av, u) 


for all u and v in V. Let us see what this says for the matrix A. 

For any matrix B , the expression (i?v, u ) is linear in v and u separately. Thus, by our 





preceding argument, there is a unique linear transformation, call it B T , the transpose 
of B. such that 

(By, u) = (v, B'u) 


for v, u in V. To se e w hzt-R T is, suppose 


/ x\ 

/V\ 

(e f\ 

v = 

U= , 1 

and B = , . 

\yj 

\y J 

\9 h J 


Then 

(By, u) = (ex + fy)x' + (gx + hy)y' 


= exx' +fyx' + gxy' + hyy' 
= x(ex' + gy') + y(fx' + hy') 

B J ( X \ = ( ex ' + dy '\ 
\y) \fx' + hy') 



In other words, the transpose of a matrix is obtained by flipping the matrix along the 



and 


Q(\) = (Ay, y) = ax 2 + 2 bxy + cy 2 
if 



A function Q of this type is called a quadratic form. Thus by the preceding formulas, 
each quadratic form Q determines a scalar product < , ), and every scalar product 
determines a quadratic form. 

The coefficients a, 2b, c of the quadratic polynomial Q(v) give us the matrix A, 
which is just another way of saying that Q determines' A and hence also < , )• 




The characteristic polynomial of A is 


(a + c) 2 — 4(ac — b 2 ) = (a — c) 2 + 4 b 2 5? 0. 

This expression, (a — c) 2 + 4b 2 , is called the discriminant of the quadratic form Q. 
The disc riminant can equal zero if and only if 

a = c and b = 0 

S ° (a 0\ r 

A = ( ) = al 

\° a ) 

and 

<u, v> = a{ u, v). 

In this case, < , > is just a scalar multiple of ( , ). 

Suppose that A has two distinct eigenvalues, X 1 ^ X 2 corresponding to eigenvec¬ 
tors and v 2 . We claim thatand v 2 are orthogonal, i.e., that (\± , v 2 ) = 0. The proof 
is easy: 


(A v x , v 2 ) = (v t , A\ 2 ) because A is symmetric; 




Ai(vi ? y 2 ) = A 2 (vi, v 2 ) bec a use the scalar product is linear; 

(v L , v 2 ) = 0-because ^l 2 . 

;ly, suppose that we start with an eigenve c tor v t of A corresponding t< 


(Vi,v 2 ) = 0. 

Then (v t ,/lv 2 ) = (Av t , v 2 ) = ^- 1 (v 1 , V 2 )^-Q 

so A\j is again orthogonal to Vi. But there is only one line perpendicular to v,, and 


iirowtiiBiM ■ m > w ■ k'iMvaiuriBiiii ■ ■ ■ ■ • i rivtii 


IlilAVkllll 


We have thus shown that any symmetric matrix A has two orthogonal 
eigenvectors, v x and v 2 . By multiplying v x and v 2 by suitable scalars, we can arrange 


that Vi and v 2 both have length 1, and that the matrix ( Xl %2 ), where v, = ( Xl ) 

\yi yi) \yj 

and v 2 = ( 2 ], is a rotation. 

\y2/ 

Thus /4 = R^ J ® ^R _1 for some suitable rotation R. Since an orthogonal 

matrix M satisfies (Mv, Mv) = (v, M T Mv) = (v, v) for all v. We see that M J — M ~\ 
and we can equivalently write 





uppose that we have chosen our eigenvectors so that X l > l 2 . Then the 
nvector v,, which has been rhncm tr» have unit lenath. can be characterize 



nn 


unit teng 






ue: i.e., 




for any v with (v,v)= 1. To prove this statement, we write v = Vj cos0 + v 2 sin ft. 
Clearly, since the eigenvectors y l and v 2 are orthogonal and have unit length, 

(v, v) = (\ l , v x )cos 2 0 + (v 2 , v 2 ) sin 2 0=1. 



Figure 4.7 


)(v) = (A\, v) 

= (A\ r cos 0 + ^lv 2 sin 0. v a cos 0 + v 2 sin 0) _ 

= (AiVi cos 0 + /l 2 vysnr0nT CDS 0 + v 2 sin 0) 

= ,v jcos 2 0 -4- A 2 (v 2 ,v 2 )s in 2 0. since (vi.v 2 ) = 0 


Clearl 


_ = — / l 2 )s i n 2 0 ._ 

)(v) achieves its maxim um value when sin 2 0 = 0, ( w hen v = + vj and its 


liagonalize A by a rotation R: 


so that 




** M.-., 


fi(v)= l> J R T> 7 

Since R is orthogonal, R T = R~ x , and we have 

ew=((o‘ Mn-vi-h} 


If we write ( 1 = R 1 f ) = R x v, then 




If X x and A 2 are both positive, the graph of (?( v , y) = kis an ellipse if k > 0, the origin 
only if k = 0, empty if k < 0. If 2, and A, a 


Sj 5 Si ill 


degenerates to two straight lines it /c = 0. The vertices of the ellipse or hyperbola, 
where tire distance from the origin is a local extremum, lie along the lines determined^ 
by the eigenvectors of A. _ 

- f 9 2 \ - 

Suppose, for example, that A = I ^ 1, so that 

Q{\) = 9x 2 + 4 xy + 6y 2 . 

The eigenvalues of A are = 10, X 2 = 5, with associated eigenvectors and 

^ We can write ,4 = where R is the rotation 

_ 1 (2 - 1 \ 


By introducing new coordinates 


Vni 2 


'x\ 1 / 2 l\/x' 


x = ^-(2x + y), 


y' = -u(-x + 2y). 


j 2 i c ,,/2 


The graph of O(v) = 1, i.e., of 


)x ,z + 5/ 


is an ellipse of minor axis yjro, major axis The axes coincide with the 


eigenvectors of y4:^j and ^ ^ 


igur 



Suppose we allow not only rotations as changes of coordinates but also non- 
or tho g onal transforma tions such as x" — a x' a nd y" = / ?/ . T hen, in terms of x" 
and y", we have 

e(v)=4x" z +||/ 2 . 

If l x ^ Q. we can choose « 2 = |\ so that XJa 2 = +1 and similarly for A,. We have 
thus proved: 

Let Q be any quadratic form in R 2 . We can then find coordinates x" and y" such 
that Q has one of the following expressions: 

f x" 2 + y" 2 


<2(v) = 


0 


n 2 


— X 

x" 2 - y" 2 

— x" 2 — y" 2 . 


If there are two plus signs, Q(\) has a minimum at v = 0; if two minus signs, a 
rnayimnm. If there is one plus sign, one minus sign, t 



Figure 4.9 Figure 4.10 



4.4. Normal modes 

One of the most important applications of the results of the preceding section is to 
the theory of coupled oscillators. To explain what is involved, consider the following 
mechanical system. We have two undamped oscillators which we connect by a 
spring with spring constant k. The equations of motion, from Newton’s laws, are 


m 1 x 1 = — k 1 x l — k(Xi — x 2 ), 
m 2 x 2 = — k 2 x 2 — k(x 2 — x x ) 


or 



/y \ 

T 


I* 1 ) 


V*2/ 

\*2 





where the symmetric matrices T and H are 


0 mi 


(ki+k - -k \ 


V ~k k?+k) 



«i m 2 k 2 

Figure 4.11 Uncoupled oscillators 



Figure 4.12 Coupled oscillators 

Our strategy will be to try to simultaneously diagonalize T and H, so as to 
‘unco u ple’ t he equations. Let us d iscuss the gener al case. We want to consider two 
symmetric matrices T and H where T is positive-definite. Our first claim is that we 
can find a posi t ive-definite mat r ix B such that 

T = B -. 

Indeed , if T is diagonal, as in our e xam ple, set 

\ 

V 0 

Otherwise, we can find a rotation R a such that 

T — R e fcRj where A is a diagonal matrix. 


B = R 9 CR T e 

is symmetric, positive-definite, and satisfies B 2 — T. Now define 

w = By 
so 

v = B~ x w. 

Then 

y = B~ 1 yi 

and the equation T\ = — H\ becomes 

TB~ l vt= -HB~ l w 

or, since T = B 2 






o 


1 UR- 1 


Note that A is again symmetric, so we have reduced the problem to the case where 
= /. (The astute reader mav have noticed that, from a geometric point of view, we 


to T takes on the normal form x 2 + y 2 .) 

To solve the equation w = — Ay/, all we have to do is to find the eigenvalues and 
eigenvectors of A. Suppose that is an eigenvector of A with eigenvalue co\ > 0. 
Then, for any choice of amplitude p and phase a, the function 

w(f) = p cos^t +a)v 1 

is clearly a solution. Similarly for the second eigenvector and eigenvalue giving 
pcos(co 2 t + «)• These are called the normal modes of oscillation of the vibrating 
system. 

Suppose that 

A = RDR - 1 

where D is a diagonal matrix. Then writing 


we have 


w - Ru 


-vv = Rii — — RDR ~ 1 Ru 

o r 

~R^D u.~ 

Since D is diagonal, this is just two separate differential equations for each of the 
components. Assu me that the ei genvalues of A ar e both positi ve - say cof and co 
Then the general solution of - 

u = — Du 
is 

f W A = /Pi cos(a; 1 r + ai)\ 

\u 2 ) \p 2 cos{(D 2 t + ct 2 )f 


v ‘ =R (o) V2 = r (i) 

are the two eigenvectors of A, we see that the most general solution of w = — ^ 4 w is 


w = Pi cosloqt + oqjvj +p 2 cos(ft)2t + a 2 )V2. 


Thus the general solution is a ‘superposition’ of normal modes. 

Let us illustrate this result in the case of two identical coupled springs. We thus 

. = Tn the absence of the coupling, the equation of 




x=-a>lx, (ti n = kjm, = 




x, = —{co t 


■s coi + s 


By symmetry we see that the eigenvectors of A are 


with eigenvalue col 


with eigenvalue col + 2s. 


These are the two normal modes of oscillation in this case. The first corresponds to 
the bobs moving in tandem, the second to their moving in opposite directions. 


i l i 


d gure 4 .13 


>' = (col + 2s) 1/2 




Then the general so 1 -"’ 


x x — p 1 cos {co 0 t + «i) + p 2 cos (co't + « 2 ), 


>0 L T - FT 


Let us examine the particular solution where we excite one spring and let it go at 
time t = 0. Thus we wish to consider the initial conditions. 

x 1 (0) = C, x 2 (0) = 0, 
x 2 (0) = 0, x 2 (0) = 0. 

Substituting into the above equations, we see that = p 2 = \C and a x = x 2 = 0. 
The particular solution is this: 

x x = ^C( cos co 0 t + cos co't ), 

x 2 = ^C(cos co 0 t — cos co't). 

Recall that cos a = j(e ia + e~ ia ) and therefore 

- - /« — ( 3 \ /oc + /A - 

cos a + cos B = 2 cos —-— cos —-— 1 



and similarly 


rr>« n — R 'i 1 

( a — /A 

i . (cc + P) 



l 2 J 

Mil 1 ' ' 

1 A-2-J 



Substituting a = co 0 t and ft — co't, we see that our particular solution is given by 


x t = Ceos (ft)' - CJ0 0 )t COS O) 0 t, 

x 2 = —C sin (to' — oj Q )t sin co 0 t. 

In the case of small coupling co' — co 0 is a small quantity. If we graph the motion of 
both springs, we get figure 4.14. The oscillators of each spring (with natural 
frequency co 0 ) are modulated. The beats are determined by the modulating factors 
cos (a/ — co 0 )t, sin (at' — co 0 )t. The energy alternates between the two springs; when 
one oscillates with maximum amplitude, the other is at rest. This phenomenon is 
known as resonance. 



Figure 4.14 


In case the two springs are not identical, but are 
behavior is similar. There will still be modulated harmonic motion at both springs. 
The second spring w ill co me to rest at periodic intervals, b ut t h e first will continue to 


oscillate even when the second is oscillating at maximum amplitude. Imperfect 
‘tuning’ results in an incomplete transfer of energy from the first spring to the second. 
We will leave the details, which are a straightforward, if somewhat messy, 
calculation of eigenvectors and eigenvalues of A, as an exercise to the reader. 


4.5. Normal modes in higher dimensions 

Let Lbe an n-dimensional vector space equipped with a positive-definite scalar pro¬ 
duct. Let < , ) be some other, not necessarily positive-definite, scalar product. An 
examination of the argument given in section 4.3 will show that there exists a linear 
transformation A: V-+ V such that 


<u, v) = (Au, v) for all u, v in V 






We claim that we can turn me argument ol section 4.3 around to show that 
mutually perpendicular eigenvectors. Indeed, consider the quadratic form 


restricted to the unit sphere 

{▼I II v || = 1}. 

This function is continuous and is bounded. Indeed, if all the entries A tj of A satisfy 


\Aij\^M 


for some number M, then if 


we have ||v|| 2 = £xf = 1 so |x f | ^ 1 for all i and 

_(/I v, \) = Y A ij x i x j _ 

so 

Let v be a point on the unit s phere where (?(v) ta kes on its maximum value. (At this 
juncture, we are really using some deep properties of the real number system which 
guarantee that there will indeed exist a point on the sphere where Q takes on its 
maximum va lue). We claim that v is an eigenvector of A . I ndeed, de fine the vector w 
by 

_ w = Tv — {Ay, v)v. _ 

We will show that w = 0 if Q takes its maximum at v. Since (v, v) — L the vector w is 


(w,v) = 0, 

and hence 

{Ay, w) = || w|| 2 . 

Then, for any real number s 

|| V + SW || 2 = (V + SW, V + sw) = || V || 2 + s 2 1| w || 2 = 1 + s 2 1| w || 2 

and 

(T(v + sw), v + sw) = (Av, v) + s04w, v) + s(/lv, w) + s 2 (Aw, w) 
or, since (Tv, w) = (w, Tv), 

(T(v + sw), v + sw) = (Tv, v) + 2s(Tv, w) + s 2 (Tw, w) 

_ ( A I O C I I II ^ _L A 





Let us rescale the vector v + sw so as to make it of unit length: replace it by 


V + sw 


(5) = (Xu, u) - 


r(X(v + SW), V + SW) 


1 + s II w 


2 11 112 


((Xv, v) + 2s || w || 2 + s 2 (Xw, w)). 


This expression is a differentiable function of s. By hypothesis, it has a maximum at 
s = 0. We conclude that its derivative,/'(0), at s = 0 must vanish. But/'(0) = 21| w || 2 . 
So || w || 2 = 0 and hence w = 0. Thus 

Xv = (Xv,v)v. 

In other words, v is an eigenvector of A with eigenvalue (Xv, v). Call this eigenvector 
Vj and the eigenvalue (Xv^vJ = 

Now consider the space of all vectors z in V which are perpendicular to v 1 . Thus 
we loo k at a ll z such that 

(z,Vi) = 0. ~ 

For such z, 

(Xz,v 1 ) = (z,Xv 1 ) = A 1 (z,v 1 ) = 0 , A t = (Xv 1 ,v 1 ) . 

Consider the s et of all z of unit len gth, that is the s et of al l z such that 

l|z|| = 1, 


Figure 4.15 


Let v 2 be a point where Q takes a maximum among these vectors. Write 

Xv 2 = (Xv 2 ,v 2 )v 2 + w 2 . 


conclude 





that (w 2 ,v 2 ) = 0, then that 


1 


r(A(v 2 + SW 2 ),(v 2 +SW 2 )) 


|v 2 + sw 2 | 


1 + s 2 II W' 


-((Xv 2 , v 2 ) + 2s II w 2 II 2 + s 2 (Aw 2 , W 2 )) 


has a maximum at s — 0 and h e nc e that w 2 = 0, and v 2 is an e igenvector of A. 

We keep proceeding in this manner: Look at all z satisfying (z, vj = (z, v 2 ) = 0 and 
|| z || =1, etc. At each stage, we produce a new eigenvector of A, perpendicular to all 
the previous ones. When does it all come to an end? When we run out of non-zero 
vectors perpendicular to \ l ,..., \ k . This can happen only if k = n. Indeed, k can not 
be > n since then v x ,..., v„ + 1 would be mutually perpendicular and hence linearly 
independent. This contradicts the assumption that V has no n+1 linearly 
independent vectors (one of the hypotheses is the assumption that V is n- 
dimensional). On the other hand, if k < n, the equations 

(v l5 w) = 0 


(v k ,w) = 0 

in R" are a system of k homogeneous linear equations in n unknowns. This always 


has a so lution. We will prove this general fact am ong others in Chapter 10. Here i s a 

of 0. If the n th compo nent of \ k is 7 ^ 0, i.e., 


Vi, = 


x. 


with x n # 0 , 




JCjWj H- 1 - %„w„ = 0 


which we can solve for w„ in terms of ,..., w„_ x : 


w„ = —+ •••x„_ 1 w„_ 1 ). 

Substituting this into the preceding equation gives k — 1 equations in n — 1 
unknowns and we can proceed by induction. If the nth component of any of the 
vectors v 1 ,...,v fe does not vanish, we can still do the same-just use the \j with 
non-vanishing nth component to solve for w„. If the nth components of all the 
vv„ vanish, then the vector 

0 

w =( 

i s a solution (all the first n — 1 components vanish). 


So w e must k ee p on going until k = n. 



formal Modes as Waves 

et us now work 


spring, with all the springs identical as well. Thus the force acting on the ith mass 
p oint is 

- jkjxj - x i+ j + k(Xj - X;_ J). 

Newton’s equations then say 

mXi = — k(2xi — Xj _ 1 — x t + x ) 

We will also assume that the first and last point are also connected by the same 
spring: so we can imagine the points arranged in a circle. 


Figure 4.16 


Thus with co 2 = kjm, the equations are 


x = — or Ax 


where A is the m atrh 


0-1 2 ••• 


\-\ 0 


-1 

-1 2 


Our problem is to find the eigenvalues and eigenvectors of A. Before describing the 
general solution, let us work out a few low-dimensional cases, beginning with the 
case n = 3. 

(:!:! : D0-0 





In order to deal with the n-dimensional case, we shall introduce some 
methodology of far reaching sign ific ance. Notice that the problem is invariant under 
the ‘rotation’ sending the first point into the nth, the second into the first, etc., 
with the nth into the (n — l)st. This is the matrix: 


(0 1 o - ••• o\ 

_/ r\ _ r\ _ 1 _A_] 



U U I ! 


S = 

It is easy to check that 

I 0 . 1 

\1 0 ••• 0 ••• o) 

SA = AS. 



We shall find eigenvectors of S. If Sw = Aw, then S,4w = ,4Sw = A{Xyi) = Av4w. So if w is 
an eigenvector of S with eigenvalue A; so is 4w. We will find n distinct eigenvalues of 
S. Then if Sw = Aw, Aw will have to be a multiple of w - hence an eigenvector of A. 

The ‘eigenvalues’ of S that we will find will be complex numbers and the 
‘eigenvectors’ will have complex entries. Both the real and imaginary parts of these 
eigenvectors will be eigenvectors of A. Here are the details: 

Let 



etc. The eigenvalues 1,t>t 2 ,...,t" _1 are all distinct. Thus each of the eigenvectors 
of S must be an eigenvector of A. Let us call these ‘eigenvectors’ e x ...e„. We know 
that 






ow the entry of the second row of Ae k is 


- 1 +2T*-!*» = (- T~ fc + 2-T fc )t fc 
= 2(1 — cos (2nk/n))r k . 

We conclude that the kth eigenvalue is 

X k = 2(1 — cos [Ink/n)). 

This is the same eigenvalue for k and for n — k. We may thus get real eigenvectors 
by adding and subtracting the eigenvectors for k and for n — k. Thus 

/ 1 \ /. 0 \ 

cos{2itk/n) sm(2nk/n) 

cos(4 nk/ri) and sin(4nk/n) 

cos(6nk/n ) sm(6nk/n) 

} ) 

are orthogonal eigenvectors with eigenvalue 

2(1 + cos(2nk/ri)). 

If n — 2m is even, then the second column vanishes for k = m. Otherwise all 
the vectors do not vanish. We can thus consider each normal mode of the system 
as a sine or cosine ‘wave’ of compression of the system._ 





We shall see that the geometry of this space gives a good model for understanding 
special relativity. We use t he word m o del in the following sense. Our ordinary 
space is three-dimensional. Therefo r e, if we add time as an additional dimension, 
we get a four-dimensional spacetime. In our model, we shall imagine that space 
is one-dimensional, so that our spacetime becomes two-dimensional instead of 
four, and we will be able to draw all the geometric constructs. Actually, most of 
what we hav e to say works in the honest four - dimensional world, with little 
modification from our two-dimensional model. 

The first postulate of special relativity is to keep Newton’s law which asserts 
that particles not subject to any forces will move along straight lines. Thus the 
geometry of our spacetime singles out the straight lines among all possible curves. 
Our spacetime is the affine plane with, perhaps, some additional geometrical 
structure. 

The second postulate is that the speed of light is a finite absolute constant. 
Thus, at each point of spacetime there are two well-defined lines representing light 
moving to the right or to the left. The spatial and temporal invariance of the speed 
of light says that translating P into Q will carry the two light rays through P into 
the two light rays through Q. 



t 



Figure 4.18 


We want to investigate those affine transformations that carry light rays into 
light rays. Since translations do, we are reduced to investigating which linear 
transformations preserve the light rays through the origin. We are thus given two 
lines x= ±ct, and ask for the linear transformations which preserve these lines. 
In doing our computations, it will be convenient to introduce natural units of 
length and time so that the speed of light is unity. For example, we could measure 
t in years and x in light-years. Or, if we choose a nanosecond (10 -9 seconds) as 
the unit of time, then the corresponding unit of length is one foot to remarkable 
accuracy. So we could introduce natural units by measuring t in nanoseconds and 

x in feet, _ 

We thus are int e r e st e d in studying those linear transformations which preserve 
the figure given by the pair of lines x — t and x — — t: 







all send the pair of lines x= ±t into x = +U possibly interchanging the lines. They 
interchange the various four regions o f the plane as shown. So. by multiplying by 
one of them, we can arrang e that the transformations we wish to study preserve 
each of the four regions. Thus we are looking at linear transformations, F, of the 


plane that preserve each of the lines x = t and x = — t and the forward re gion 
t 2 > x 2 , t > Q. 


To study such transformations, w e might as well pass to coordinates in which 


these lines become the coordinate axes: 

(p\ _ . ft \ _ /I 


= R 


- ii 


R = 


1 


- 1 ' 
1 


R - 1 = 



q=t(t+x) 


1 

- 1 




p = i(x-t) 


Figure 4.23 

Thus R~ 1 FR preserves the coordinate axes and the positive quadrant. Thus R~ l FR 


is a transformation 
and preserves the first quadrant. Thus 


a xes, hen ce a diagonal matrix, 


'a O' 


R-i FR = l Q d j , a > 0, d >0. 


= s 2 and a/d = r 2 so 


_/ o_n\ / v _n_\ 


- o 

R~ X FR = A 


. 

\0 s / 

V r r ) 



Therefore we have proved that 


F = SL r 


where 


S = 


s O' 

0 s 








j — t-'rl 

Kx) 




We claim tnat 


<L r v 1 ,L r v 2 > = <v 1 ,v 2 > 


(4.3) 


for any pair of vectors v l5 v 2 . Indeed, by the analogue of (4.1) for the scalar product 
< , ), it is sufficient to prove that 

<2(V) = <2(L r v) 

Now 

Q(v) = t 2 - x 2 = - Apq 

and if 


v' = L r \ = 



then 



and 


Q(v>) = - 4p'q' = -4pq = Q(v) 


to prove. 


which 

The effect of the scale transformation S is to multiply all lengths and time 
“measurements by a factor ofs7 The existence of atomic clocks, along with definite 

f or s 1 , is n ot a sy mm etry o f n atu r e. 



Q{A\) = Q(v) for all v i n V 


Is called a Lorentz transformation. Such an A must carry the light cone (also called 


the null cone) 


{view=oj 

into itself, i.e., preserve the set {x = ±t}. If, in addition, A carries the forward region 
into itself, it must be a proper Lorentz transformation 

A = L r , 

for some r. 

The proper Lorentz transformations can be characterized among all Lorentz 
transformations by the property that they can be continuously deformed to the 
identity through a family of Lorentz transformations. Indeed, let A(t) be a family 
of Lorentz transformations with A(0) = /, A{i) = A. Let v be some point in the 
forward region. Then A(t)\ can not cross the null cone since Q{A(t)\) = A{\) > 0. 
Simi l ar l y Pet A = ±1 for any Lorentz tr a nsform a tion since A times a matrix of 


1 


1 " V 


1 




jjjj .. 



and Pet L= 1 f or a proper Lorentz t ransformation L. Thus since Pet/4(f) varies 





continuously with t and Pet4(0) =1. we must have DetAjt 
s. if A can be c 
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The product of two proper Lorentz transformations is again a proper Lorentz 
transformation. Indeed, if 


0 r 


f v/ i , 

o ■ 


then 


L ' L '= R io r 0 ‘) R ~ ,R (o A) R_I 


L/ r L r ’ L rr '. 


It is convenient to write r = e a and set 


L- = L. = i( e ‘ + e ~° 

2 V e“ — e “ e“ + e “ 


Then 


ra. ra' _ ra+a’ 


Sometimes, the hype r bolic functions 
-cos h a = ^(e a + e~ g ) 


ised sc 


a = ^(e g — e g ) 


"cosh a sinha 


Then the Lorentz transformations L a look very much like the rotations R 0 : 


cosh a sinh a 
sinh a cosh a 


while R„ = 


cos 6 — sin 9 


sin 9 


cos 0 


We have the multiplication formulas 


L «:. L a 2 = L a l+ « 2 while R^. R^ = R& + ^ 

as we let a vary, the point L a \ moves along a hyperbola, except in the limiting 
case where v lies on the light cone, in which case L“v moves in or out along the 
light cone (unless v = 0 when L a \ = 0 for all a). It is for this reason that the functions 
cosh and sinh are called hyperbolic functions, with cosh called the hyperbolic cosine 
and sinh the hyperbolic sine . As we let 6 vary, the point R e \ moves along a circle, 
except for v = 0 which stays fixed. This is why cos and sin are called circular 
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J oincare 


plane which are invariant under all Euclidean transformations. A Poincare 

transformation of the plane is a transformation of the form 

where Lis a Lorentz transformation. The geometry of special relativity is concerned 
with those properties which are invariant under all Poincare transformations. To 
be parallel to the y-axis is not a Euclidean property of a line, l : if l is parallel to 
the y-axis, then Rl will not be parallel to the y-axis if R is a rotation other than 
through 0° or 180°. Similarly, to be parallel to the x-axis is not an admissible 
property of a line in special relativity; if l is parallel to the x-axis, then LI will not 
be, for any proper Lorentz transformation L other than the identity. This last 
assertion is usually formulated by saying that ‘the notion of simultaneity does not 
make sense for spatially separated points in the theory of special relativity’. 

Similarly, the notion of a particle ‘being at rest’ makes no sense. We might want 
to say tha t the line x = 0, the r-axis, re presents a stationary particle at t he origin. 
But the Lorentz transformation L r carries this line into the line through the 
origin and 

_/'l\ 1 /r+ r _1 \ 


Thus L applied to the t-axis is the line 


fr + r--~ r ) 


\r-r~ 1 > 


V ' 7 




r — r * 

r - 1 

~ r + r~ 1 

r 2 + 1' 


This now looks Tike the line of a particle moving with constant velocity v. We can 
solve the equation 

v = (r 2 - l)/(r 2 4- 1) 

for r in terms of v 

r = J{{l + v)/{\-v)) 

as can easily be checked. We can, if we like, use v as a parameter to describe L: define 

L(v) = L r = L e . 


where 


r = V((l+i>)/(l-o)) = e* 

r —r _1 e* — e~ a sinha 

— 7 - =-=--— = tanh a 


r + r 


- 1 ~a 


e a + e a cosh a 


otice that 



But 


r v + 1 / n 

1/2 


l +, , , 


rr' = 

I cc 



, y + v 



\+vv' 



SO 

L(v)Uv')-L^_ w ,y (4.7) 

This is the addition of velocity law 
We are thus using three differei 
transformation: 

L r = l 

where 

r = e 

The formula for multiplying two c 
depending on the parametrization 
We have shown that the linear 

’ i 

it 

•a 

k 0C 

)f 

l. 

tr 

m 

n special relativity. 

parametrizations of the same proper Lorentz 

= L(v) 

=V(( 1 + y )( 1 -*>))• 

them is given by equations (4.4), (4.5), or (4.7), 

ansformations of special relativity preserve the 

q uadi a Ik, hum (>(v)rBut we have 

5t given a direct physical interpretation of Q(v). 

Here is one involving only light rays and clocks: Consider the points t t and t 2 on 


the t-axis which are joined to f ) by light rays (lines parallel to t = x and t = — x). 

\ x / 



/ 


/ 
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/ / 

\ 
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/ / 

V 

Then 

t — t 1 =x 

K -•-- w 

\ ti t h 

Figure 4.25 

or t x = t — x 

and 

_ t __ t — y_n r f , — f a v-_ * _ 

t- 2 t A trt 1 2 t 1 A 










tor v = 


on, wishes to communicate with v. It records 




will reach v and recorc 


ie time t- 


when the return signal, issued immediately is received. The product, t { t 2 , is the 
Minkowski distance Q(\) between the two events. Notice that if v lies on the line 
x = 0 then t 1 =t 2 = t since the transmission will take no time at all. If v lies on a 


light ray through II, then t x = 0. If Q(v) < 0, then t x < 0 and t 2 > 0. 


Figure 4.26 


lere is another i mpo rtant pr op erty of the ge om etry of Mink owski space: Recall 
; in Euclidean geometry, we have the triangle inequality 

II u -h v II ^ Hull + II v II, 
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v lie on tne same line ana point in the same direction, 
lis is illustrated in figure 4.27. 

c = a + b, a = || a ||, h=||v||. 


b R 





The broken path is clearly longer than the straight line. This shows that ‘the 


straight line is the 


t wo points’ in Euclidean geometry . 


Now let us consider a similar diag r am in our spacetime geometry, where th e 


Tircles 


\A-P\ 


a* 


and II B — R 


I 2 _ Jy2 


are re placed by hyperbolas Q(A — P) = a 2 and Q(B — R) = b 2 . But now Tor any 
segments l and m which give a broken path from P to R we have 

Q(l ) < a 2 and Q{m) < b 2 . 



Now 0(1) is just the squar e o f the length of tim e elapsed on a clock moving 

the reverse triangle inequality. 


Th e time measured by a clock moving un iformly from P to R wi ll be longer 
than the time measured by a 


to R. 


This is called the twin effect. The twin moving along the broken paths (if he 
survives the bumps) will be younger than the twin moving uniformly from P to 
R. This is sometimes known as the twin paradox. It is, of course, no paradox, just 
an immediate corollary of the reverse triangle inequality. 


4.7. The Poincare group and the Galilean group 

So far we have been describing the transformations of Euclidean geometry and of 
special relativity in terms of natural units. The points of spacetime are sometimes 
called events. They record when and where something happens. If we record the 
total events of a single human consciousness (say roughly 70 years measured in 
seconds) and several hundred or thousand meters measured in seconds we get a 


set of events which is enormously stretched out in one particular time direction 
compared to the space directions, by a factor of something like 10 18 . Being very 
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Figure 4.29 


sk inny in t he s pac e direction as c ompared with t he time direction we tend to ha ve 
a preferr ed spl itting of spacetime with space and time picked out; and to measure 
distances in space with much smaller units (such as meters) than the units we use 


(such a s seconds) to measure ti me. Of course, if we use a small unit the corres pond- 
ing numerical value of th e m ea sur e m e nt will be l a rge; that is in terms of human 
or ‘ordinary’ units, the space distances will be greatly magnified in comparison to 
the time differences.~This su gg ests that we consider v ariab l es T a n d X related to 

-X by T — t and X — cx, or 



1 0 




0 


where c is a large number. The light cone | x[ = \t\ goes over into c 1 \X\ — \ T \ or 


\X\ = c\T\. 

We say that ‘the speed of light is c in ordinary units’. Similarly, the hyperbola 
t 2 — x 2 = k goes over into the curve T 2 — c 2 X 2 = k; the ‘timelike hyperbolas’ 
corresponding to k > 0 look very flattened out, almost like vertical straight lines 
for small values of X. 

Let us see how to express a Lorentz transformation in terms of ordinary units. 
We do this as follows: we pick a point find t ^ ie P°i nt = 

that it corresponds to then apply the Lorentz transformation L to ^ J to obtain 
1 0 \/T\ 

L( . II - and then express this new vector in ordinary units by multi- 


.0 c 






cosh a = 1 + \v z jc L + • • •. 


= 1 + 


-C i) +£ 

where the entries of E are all of order c~ 2 . The matrix 

°--C. °) 

is called a velocity transformation corresponding to velocity v. It preserves the lines 
T— constant; in fact G v (^\ = ( \ We thus see that the velocity trans- 
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\X J \X + vTJ 

formations can be regarded as ‘limiting cases’ of Lorentz transformations. When 
considering the velocity of light to be very large, the timelike hyperbolas go over 
into v ertical straig ht lines, an d L orentz transfo rmations with small value s of « 


as can easily be 


transformation of the form ( “ v 

-- —— \X 


transformation. 


>city trans forma tions also forms a group, G Vi G V 2 ^G Vl + — 
If and this group preserves the notion of simultaneity. 

the form is known as a Galilean 

GralTlea n transformation is a translation comp osed with a 





mstein to recognize that our notion o 


ClMKII lUkl WiH I 


that the velocity transformation G v = 


v 1 


must be regarded as an 


approximation to the Lorentz transformation: 

/W (l-y 2 /c 2 ) v/c 2 yj{\ -V 2 /c 2 ) 

WVo - y2 /c 2 ) i/Vo - v2 / c2 ) 

(expressed in ordinary units). 


4.8. Momentum, energy and mass 

The passage from the Galilean group to the Poincare group required a refor¬ 
mulation of the basic concepts of mechanics. The outline for such a theory 
^was pointed out by Poincare in his address to the World’s Fair in St Louis 
in 1904 and was carried out by him, and, independently, by Einstein, in th e ir 


: a 
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Let p A dciiolc the momentum of particle A before the collision, and p A denote its 
momentum after the collision. Similarly for particle B. The law of conservation of 


, , , , conservation 

Pa Pb Pa Pb momentum' 

The collision is called elastic if the total kinetic energy is conserved. An example of an 
inelastic collision is one where the particles get stuck together upon impact. 
Conversely, if two particles are initially in contact, and at rest, say, with an explosive 
charge between them, when the charge is exploded the particles will move apart. 
This can be regarded as a reverse ‘collision’: if we ran a film of it backwards, it would 
look like two particles colliding and sticking together. Total kinetic energy is not 
conserved - the total kinetic energy was zero before the explosion and positive after 
the particles were set in motion. The energy released by the explosion was converted 
into kinetic energy. Similarly, we believe that when two particles collide and stick 
together, kinetic energy is converted into energy of some other form; heat or 
potential energy. For an inelastic collision one still has the law of conservation of 
momentum. In an elastic collisio n, there i s no excha nge between kinetic and other 
forms of energy so the total kinetic energy is conserved: 

- E a + E b = E' a + E ' B e ° n servatlon 

of energy 

where E A denotes the kinetic energy of particle A before the collision, E' A its kinetic 
energy after the collision, etc.___' 






the definition of momentum and of energy: 

In Newtonian mechanics, the momentum of a particle is defined as 

p = mv 

where v is the velocity of a moving particle and m is its mass. The velocity (and hence 
the momentum) is a vector in three-dimensional space. In our model universe it will 
be considered as one-dimensional. (Alternatively, we can consider particles cons¬ 
trained to move on a line.) The mass can, in principle, be defined by the following 
series of experiments. Suppose we have a collection of objects - say little balls made 
of different materials. We consider two held together at rest and then pulled apart by 
an explosion set off between them or by a spring released between them. One object 
will then move to the right and the other to the left. If the two objects are identical - 



igure 4.; 
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will bounce back and collide witn one another at precisely the initial point of 
explosion. We can perform the experiment and observe that this is indeed the case. 



the right of center - the smaller ball will have travelled further. We can then perform 
the same experiment with balls of differing materials. For example, we will find that if 
we use two balls of the same diameter, one of lead on the right and one of aluminum 
on the left, the point of collision will be to the right. On the other hand, if we take a 
very small ball of lead on the right with our fixed size ball of aluminum on the left, we 
will find that the point of collision will be to the left. Assuming that we have enough 
sizes of balls of lead, we will find a lead ball which exactly matches the aluminum 
ball. 

We can now compare lead balls with copper balls, say. Suppose we found an 
aluminum ball that matches a lead ball (in the sense that the point of recollision is at 
the center) and a copper ball that matches the lead ball. We can than compare the 
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can now define the notion of mass by declaring that two objects have the same mass if 
they match in our explosion-collision experiment. The la w o f nature referred to 
above is then the assertion that this notion of mass is well defined — if A has the same 
mass as B and B has the same mass as C, then A has the same mass as CTWe can 
observe the following law of nature: If A l matches B l and A 2 matches B 2 , then 

show that A 1 and A 2 match B l and B 2 . (Alternatively, we could also observe that the 


Figure 4.32 

mass of a ball of the same material is proportional to its volume: if a ball of radius r A 
of lead matches a ball of radius r B of copper, then a ball of radius 3 r A of lead will 
match 27 balls of radius r B of copper.) This allows us to introduce units of mass: 
having fixed one object say a lead ball of volume 1 cm 3 , we can then compare any 
other object with a multiple of our given object (a lead ball of volume m) and this 
assigns, in a well-defined way, a numerical value to any mass. Originally, in the 
metric system the gram was taken to be the mass of the 1 cubic centimeter of water of 


* We will also find as a law of nature that turning the apparatus around - that is, interchanging 
right and left - will not affect the matching or non-matching properties of objects. 



4 °C. Since water at 4 °C is difficult to work with in our collision experiments, we 
rnight want to define the gra m as a of a ball of copper whose volume is 0.11.,. 
cm 3 . It is interesting to observe that in th e above series of experiments we did not 
need any clocks. 

We now return to the conservation of momentum. In Newtonian me chanics 
this laws s a ys that if we define momentum by 

p = mv 

then the total momentum is conserved. (In fact, it is not hard to show that this 
version of the law of conservation of momentum is a consequence of our definition 
of mass and of the assumption that the laws of nature are invariant under the 
Galilean group. See Feynman’s Lectures on Physics, I, Chapter 10 for a very lucid 
presentation of this argument.) In special relativity this definition of momentum 
makes no sense because velocity makes no sensei After all, velocity is defined as 



and this presupposes that we have chosen x and t axes and have decided to 
parameterize the curve describing the motion of the particle by t - that is why we 
are writing the curve as x(t). If we apply a Lorentz transformation, we will get 
different t'- and x'-axes and hence a different velocity, v'. Let us pu t the p roblem 
another way. Suppose we decide to parametrize the curve describing the motion 
of the particle in spacetime by some neutral third parameter, s. For example, s 
might be th e r eading on som e inte rn al clock th at t he p article mig ht be carrying 
_along with it on its motion. Thus the curve in our space time plane is given by 


", x ( t(s)\ 


u(5) ~U) 

• 


At some instant s 0 , we can compute the tangent vector 




1 1 





fi 3 





Figure 4.33 






The velocity v = dx/dt is then given by 


)rdinate system. It is clear 


the ratio v = b/a makes no sense in that if we replace w by 


w' — Lxv 


and write 


w ' = [ b , )> ”' = b 'l d 


then (unless v = ± 1 or I) v' will not be equal to v. The one property of w that is 
conserved is 

Q(w) = a 2 - b 2 . 

The condition Q{ w) > 0 is the same as the condition \v\ < 1. Since 1 is the speed 
of light in our units, it does make sense to say that the velocity v is less than the 
speed of light. 

It is an e xperi mental fact that all particles with positive rest mass (defined below) 
move at speeds less than the speed of light - that, for them, Q(w) > 0_ 


Q(w) = n 2 > 0 


a 2 — b 2 = n 2 


ia = v 


the ratio of b t o a. (This is a reflection of the fact that we have not really specified 
the mysterious parameter s in the curve u(s).) But we can solve the two equations 
a 2 — b 2 = fj. 2 and h/a = v to get 

A* 

a Vo-* 2 )’ 


Vo-” 2 )’ 

in a given spacetime splitting. For small values of v we have the Taylor expansion 


Vo - v 2 ) 


= i+ir 2 - 


— u + ^uv 2 


2 



Ill 

mechanics. We are thus led to the following modification of the definitions of 
energy and momentum. Associated to any object there is a definite valued? p. To 


nig lag 


coincides (up to a choice of units, of course) with the rest mass defined experi¬ 
mentally above. Suppose that m 0 > 0 (as we have been implicitly assuming). Then, 
when the object is in motion, its energy-momentum vector is defined to be the 
unique vector 

fE\ 


such that 
and 


Q(w) = E 2 -p 2 = m 2 0 


yin 

w is a scalar multiple of u = — where u(s) is the curve describing the 

ds 


motion of the 
In terms of a given sc 


ject in spacetime. 
time splitting whe 

« ( W t( lj 




uls) = 


we nave 


dtt{s ) 
m 0 v 

Vo-" 2 ) 


Vo-" 2 )' 

In particular, if the object is at rest in a spacetime splitting so that v = 0, then 

p — 0 and E — m 0 

in that system of coordinates. 

The law of conservation of energy-momentum now says that 


(E A \ 

UJ 

(E A=i 


l + l 

E' B \ 

v Pa ) 

1 ^ 1 

V Pb ) 

\Pa 

1 ^ 1 

\ P'r ) 


at any collision - a con 



We have written all of the above equations in terms of natural units where the 
s peed of li g ht is one and v is a numbe r, so an expression such as ^ (1 - v 2 ) ma kes 
sense. If we use ‘psychological units’, then v is not a number but a velocity expressed 
in cm/s, for example. So an expression such as ^/(l — v 2 ) makes no sense as it 
stands. We must replace it by J(1 -(v 2 /c 2 )). To make p look as it should in the 
small v approximation, we must write 

~ m 0 y/c 
P J(\-v 2 /c 2 )' 

Similarly, to make the units of E and the kinetic energy term come out right, we 
must write 

F _ ™ 0 c 2 

V(1 -v 2 Jc 2 Y 

This is the appropriate rescaling. For the particle at rest, we get the famous Einstein 
mass-energy relation 

E = m 0 c 2 . 


4.9. Antisymmetric fo r ms 


two kinds of scalar product 
the Euclidean scalar product defined Hy 


(w w') = + yy' where w = | 


| ancL w = j 

frc*\ 










ft \ ft> \ 


(v, v') = ft' — xx' where v = 

and v' = 




\xj- 

vn 



Both of these scalar produc ts are bilin ear, that is, whe n one var iable is held fi xed, 


we get a linear function of the other: 

(aw t + bvi 2 , w') = a(wj, w') + b{ w 2 , w') 
and so on. Also, both of these scalar products are symmetric: 

(w, w') = (w', w) 

and 

(V,v') = (v',v). 

We now introduce a third kind of product between two vectors in the plane which 
is bilinear, but anti-symmetric: we define 


cn(v, v') = qp' — q'p = Det 


(:;) 


where v = ( ^ | and v' = ( ^ 


Here 


co(v, v') = — cu(v', v) 




Figure 4.34. 


which is what we mean by antisymmetric. The geometric meaning of fo(v,v') is 
clear; it is the oriented area of the parallelogram spanned by v and v'. It is also 
clear that a>(v,v') is bilinear. Such an co is called a symplectic scalar product. 

A linear transformation, A , is called symplectic if it preserves the scalar product 
co. Thus A is symplectic if and only if 

co{Ay , Ay') = co(v, v') 


for all v and v\ The matrix whose columns are Ay and Ay' is just the product of 
the matrix A with the matrix ( ^ ]. Therefore 

\P P J 

'q q' 


Thus A is symplectic if and only if Det/I = 1. Any symplectic matrix clearly has 


a>(Ay, Ay') = Det ,41 1 I = (Det A) Det I 
\P P) \ 


an inverse which is again symplectic and the product of two symplectic matrices is 


again symplectic. Thus the collection of all 2 x 2 symplectic matrices forms a group, 
c a lled the (two-dimensional) symplectic group. The symplectic group plays a very 
important role in the study of optics, as we shall see in Chapter 9. 


A 





You should be able to list and apply the properties of a Euclidean scalar productT 
You should be able to write down the transpose of a. matrix and to apply the 
transpose operation in connection with scalar products and Euclidean 
transformations. 

Given a vector space of 2 or more dimensions, with a Euclidean scalar product, 
you should know how to use the Gram-Schmidt process to construct an ortho¬ 
normal basis and to find the orthogonal projection onto a subspace. 


B Quadratic forms 

You should be able to express a quadratic form <2(v,v) in terms of a symmetric 
matrix A and relate maximum and minimum values of Q to the eigenvectors and 
eigenvalues of A. 

Given a quadratic form Q on the plane, you should be able to introduce 







w = — 




eigenvalues of A. 

D Lorentz scalar product 

You should be able to calculate the Lorentz scalar product of two vectors, identify¬ 
ing Lorentz transformations that preserve this scalar product, and apply these 
concepts to the special theory of relativity. 


Exercises 

4.1. (a) Using the three properties of the scalar product (symmetry, linearity, 

positive-definiteness), prove the Cauchy-Schwartz inequality 

_(v, w) <V((v,v)M ) ) 

for any pair of v e ctors v and w. 

(Hint: Consider (v — aw, v — aw). This is a quadratic polynomial in a, but it 
can riot have any real roots unless v = aw:) 

(b) Prove the triangle inequality~ 

- IIV + wII ^ M + ll wll - 

(where || v || 2 = (v, v), etc.) (Hint: square both sides and use (a)) 

4.2. (a) Let v and v' be two vectors in the plane. Show that a rotation R a through 


- n -KY)- 

cos 6 = —- 


will carry v into a multiple of v'. Determine the angle between II and 


(b) Let v and v' be two vectors in two-dimensional spacetime which are either 
both spacelike, both forward timelike, or both backward timelike. Show 
that a proper Lorentz transformation L a for which 

, ( v > v '} 


cosh a = 


7({v,v}{v',v'}) 


will carry v into a multiple of v'. 

Use this result to find a Lorentz transformation which carries f ^ into 


What goes wrong if y j s spacelike but v' is timelikeV If v is forwarc 
timelike but v' is backward timelike? If v or v' is lightlike? 

For practice with the Lorent? scalar product, consider the following 
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(A 


V! =| 


. V 2 = 

i —t, 

, v 3 = 


» V 4 = 

UJ 



(a) Calculate the Lorentz scalar product {v, v} of each vector with itself. 
Plot each vector on a spacetime diagram and identify each as 
spacelike, forward or backward lightlike, or forward or backward 
timelike. 

(b) Calculate the Lorentz scalar products {v 2 ,v 3 }, {v 5 ,v 6 }, and {v 3 ,v 6 }. 

(c) Calculate the vectors w u ...,w 6 which result from applying the 
Lorentz transformation; 

■’-(!;) 

to each of the vectors v t ...v 6 . Plot the transformed vectors on the 
spacetime diagram. 

(d) Calculate {w 2 ,w 2 }7 {w 6 ,w 6 }, { w 2 ,w 3 }, {w 5 ,w 6 |7 and {w 3 ,w 6 }7A11 
these scalar products should be the same as for the corresponding v 

_vect ors ._ 

4.4. Let S be a symmetric matrix with positive eigenvalues. Define a new scalar 
product [v, w] s by the equation [v, w] s = (Sv, wf 


[v, w] s if and only if C SC = S). 

(c) Describe a procedure for constructing a matrix B with the property 
that if v' = B~ ‘v, w' = B~ 1 w, then [v, w] s = (v', w'). Explain how, given 
one matrix B with this property, you could construct many others. 


, 3.7 0.9 , 

4.5 In the preceding problem, let S = I I. 

(a) Find a vector v which is orthogonal to w = 


under the scalar 


product defined by S, so that [v, w] s = 0. 

(b) Construct a matrix B with the properties described in 4.4(c), and 
verify that with v and w as in part (a), {B ~ 1 v, B~ ‘w) = 0. 

(c) Construct an orthogonal projection matrix P, satisfying P 2 = P, whose 

image consists of multiples of w = ( I) and which satisfies 





product defined by S. Hint: R = 


0 - 1 


0 


satisfies R' 


I and 


preserves the ordinary scalar product. 




_ / 

4.6. Apply the following procedure to the quadratic form 


Q(\) = 8x 2 4- 12xy + 17y 2 


(a) Write Q in the form (,4v,v) where A is a symmetric matrix. 

(b) Find the eigenvalues of A. 

(c) Express A in the form 

0 


A = R e 


.0 


Ult 1 . 


(d) Find coordinates x' and y' such that Q can be expressed in the form 

<2(v) = 20x' 2 + 5y' 2 . 

(e) Sketch a graph of the equation Q(v) = 20. Indicate both the xy-axes 
and x'y'-axes on the sketch. 

(9 2\ 

4.7.(a) Determine the eigenvalues and X 2 °f the matrix S — I 1, and find 

eigenvectors v t and v 2 associated with these two eigenvalues. 

(b) C onstruct a rotation matrix R such that S = R AR ~ x , wher e A is diagonal. 
Be su re that R represents a rotation! 


(c) Find new coordinates x' and y', l i near 

9x 2 + 4xy 4- 6y 2 = X x x' 2 + X 1 y' 2 . 


y , such th a 


_4.8.(a)_ Determine the eigenvalues_ and_eigenvectors of the matrix_ 

- 

V . 6 10,/ 

(b) Construct a rotation matrix R and a diagonal matrix A such that 
A — RAR l . 

(c) Sketch the graph of the equation 10x 2 + 12xy + 10y 2 ="2£. 

4.9.(a) Find coordinates x' and y' such that the quadratic form 

2(v)^ — x 2 + 6xy + 7y 2 
can be expressed in the form 

2 ( v ) — X x y! 2 + X 2 y' 2 . 

Identify and sketch the graph of Q(v) = 40. 

(b) Let x and y lie on the unit circle, so that x = cos 9, y = sin 9. Find the values 
of 9 for which Q achieves its maximum and minimum values, and calculate 
those maximum and minimum values. What is the relationship of these 
answers to the answers to part (a)? 

4.10. Suppose that M and K are both symmetric 2x2 matrices. 

(a) Construct an example to show that M~ l K is not necessarily 
symmetric. 

(b) Describe how to construct a symmetric matrix B such that B 2 — M~K 


Show that the matrix S = BKB is symmetric, and hence can be written 




(c) Show that if A = BR, then M 1 K = AKA 1 . This proves that M 1 K 
has real eigenvalues. 




(A 


(uj L>eime new coordinates x' and y' by | 



Q 

. Show that, if 


v = then (Mv,v) = x' 2 + y' 2 , while (JCv,v) = X^x' 2 + A 2 / 2 


(Hints: B is symmetric, so (Bv,w) = (v, Bw). R is orthogonal, so 
{Rv, w) — (y,R~ 1 w).) 


4.11.(a) Show that, if A = 


0 a' 

.a 0, 


|, exp(M) is a Lorentz transformation. 


(b) In relativistic mechanics, the total energy E and the linear 
momentum p of a particle of mass m moving along a line form a 
/ E\ 

vectorv = ^ Jwith<v,v> = E 2 - p 2 = m 2 . If the particle moves so that its 

acceleration is always a according to an observer who sees the particle as 
instantaneously at rest, then E and p are related by 

d E dp 

— = w, <xE, 
dt dt 


where x is time as measured by a clock carried along with the particle. 
Solve these equations to determine 



4.12. Suppose that distances along two perpendicular axes in the plane are 
measured in units which differ by a large factor c. For example, in 

—considering straight lines which might be drawn along a straight super¬ 
highway which is 1000 kilometers long (along x) but only 1000 centimeters 
wide (along y), we might wish to define new ‘ordinary’ coordinates by 
X = x and Y = cy, where c — 10 5 , so that X is measured in kilometers 
while Y is measured in centimeters. Construct the matrix that represents a 
rotation through angle 6 in terms of coordinates X and Y, and show that 
for lines whose slope Y/X in ordinary coordinates is a number of the order 
of unity, the rotation matrix becomes a shear matrix in the limit c -> oo. 
Explain this phenomenon geometrically by considering what happens to 
the circles x 2 + y 2 = k. 

(Note: After working this problem, reread the discussion of the limit c -*• oo 
for Lorentz transformations, in section 4.3.) 

4.13. Calculate the symplectic scalar product a>(v 1 ,v 2 ) for the vectors 



Confirm explicitly that this scalar product is preserved under the action of 
the symplectic matrix 






4.14. Consider the system ot springs and masses shown in figure 4.35. 


/ 


/ 

/ 
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/ 
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Figure 4.35 


(a) Show that, if x x and x 2 represent displacements to the right of 
equilibrium, then the motion of this system is governed by 




= -H i 


x x 

,x. 


where 


T= 


'4 

,0 


0 
1, 


and H = | 


4 

- 1 


- r 

i 


(b) Let B be the diagonal matrix with positive entries satisfying B 1 = T. 
Construct the matrix A = B~ l HB~ l , find its eigenvalues and eigen¬ 
vectors, and use them to determine the general solution to w = — /4w. 

—of each in terms of co 0 = yJ(K/M ) and b y specifying the ratio x 2 /x x . 


4.15. Consider the system of masses and springs shown in figure 4.36. Let x x and 
x 2 denote displacements to the right of equilibrium. - 


VL 


IK 




2 M 

-W- 

M 


/ 



V V 




—Xi - 


A 


x 7 


(a) Determine the frequencies co a and co b of the normal modes and 
determine the ratio x 2 /x l for each mode. 

(b) Suppose the masses are released from rest, with initial displacements 
x r =A, x 2 =0. Find expressions x x (t) and x 2 (t) that describe the 
subsequent motion of the system. 

f E \ 

4.16. A particle whose energy-momentum vector is 1 I is subjected to a 
Lorentz transform represented by the matrix 


1 fr + r 
2\r-r- 


-1 


r — r 


r + r~ 


Show that the sum of its energy and momentum is multiplied by r, while 
their difference is divided by r. Interpret this result in terms of eigenvectors 


lues of L. 


4.17. A particle of mass 15 (arbitrary units) moving at velocity u = j -f (in units 



where c = 1) collides with a stationary particle whose mass is 6 units, and 
the two combine to form a single particle. 


(a) Determine the eners 


;ntum vector 


I for each of the colliding 


particles and for the single particle formed in the collision. Thereby 
determine the mass and velocity of the particle that is formed. 


orentz transformation matrix i 


, wmc 


_ 4 1 

3 3 

corresponds to a velocity of fc, determine the energy-momentum 
/ £'\ 

vector ^ t j for each particle as viewed from a frame of reference 
moving to the right at speed f c. 

4.18. Suppose that two particles have energy-momentum vectors w, = [ 1 ] 

\P i / 

( E 2 \ 

and w 2 = I respectively, where m, = E\ — m 2 = E 2 — pj. 

\P2 / 

(a) Write the Lorentz scalar product, of these two vectors as {w 1 ,w 2 } = 

m x m 2 cosh a. Show that v = tanh a = ^/(cosh^a — l)/cosh a represents 
the speed of one of these particles in a frame of reference where the 
other is at rest. 

(b) Determine v for the case where 



and for the case where 



19 I n units where c i s n o t numerically enua l to 1 the matrix that r ep r ese nt 

i- ^ V*- 111 t Tfjntvrv t tv tivv vClilVl lVvilt T VVI VvCvX V1IV lilvirVA VIIW. V X V IZt V tJV 11V 

Lorentz transformation acting on iH is 


.csinha cosh a 


(a) Show that the matrix that transforms ^ J is the transpose of this 
matrix. 

(b) Show that the same matrix L will serve to transform energy- 
momentum if we represent it as a row vector, i.e., 

(E',p') = (E,p)L. 

4.20. A photon has energy and momentum that are equal in magnitude (in units 

• • AN 

where c — 1). That is, its energy-momentum vector is of the form £1 or 




4.21. 


mass m plus a photon. Use conservation of energy-momentum to 
determine the speed of the particle of mass m and the energy of the 
photon- 

lb) Use the Lorentz transformation to describe this decay process i n a 
frame of reference where the particle of 2m is initially moving at s peed 

3 

- 5 -- 


A photon of energy E , whose energy-momentum vector is 




in units 


where c— 1, collides with a stationary particle of mass m, to form a single 
particle of mass m 2 . Show that 


E 


y 


2 2 
m 2 ~ m l 

2m t 


4.22. Using the scalar product (f,g) = J“/(t)g(t)dt, construct an orthonormal 
basis for the space of functions which satisfy the differential equation 
x + 3x + 2x = 0. 

4.23. Construct an orthonormal basis for the subspace of IR 4 spanned by the 
three vectors 
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Chapters 5 and 6 present the basic facts of the differential 
calculus. In Chapter 5 we define the differential of a map from 
one vector space to another, and discuss its basic properties, 
in particular the chain rule. We give some physical applic¬ 
ations such as Kepler motion and the Born approximation. 
We define the concepts of directional and partial derivatives, 
and linear differential forms. 


Introduction 


Our first go a l 
functions: 


d iffe re n t ial calculus for four t y pe s o f 


(i) functions from IR 1 —^ [R 2 , 


(iii) functions from 


iR, and 


(iv) functions from IR" 


Functions from IR 1 —> IR 2 can be visualized as curves in the plane: The graph of a 
function from [R 2 -*■ [R 1 can be visualized as a surface in three-space. Functions from 
IR 1 -► [R 1 are familiar from first-year calculus. We studied linear functions from one 
plane to another in Chapter 1. 






t ane 



Figure 5.2 A function from U 2 to IR 1 


We now want to extend that study to include nonlinear functions from one 
plane to another: In order not to have to consider the various cases separately, 
we will introduce some uniform notation when we develop the theory. In what 
follows we will let K, W,Z, etc. stand for either iR 1 or R 2 . So when we write 

f.V^W 



(read: f maps V to W or ‘/ is a function from V to W'), we can be in any of the 
four cases according as V is U 1 or U 2 and W is IR 1 or U 2 . In fact, our notation 
and proofs will be such that we can allow V, W, etc. to be the spaces R", or, more 
generally, any finite-dimensional real vector spaces or affine spaces (when we get 
to learn what these spaces are in Chapter 10). In fact, we shall illustrate some of 
these more general computations in this chapter, even though we will not have 
made all of the formal definitions. 


We begin by pointing out a fact that the reader is probably aware of by now, as an 
easy generalization of the discussion in Chapter 1: a linear map from R p to U q is given 


by a matrix with q rows^nd p columns. Thus 

(5 

2 

1 

0 

1\ 

A -1 4 

1 

- 1 

0 

3 ) 


n 

1 

1 

0 ) 

V 

U 







so that B maps [R 3 -> [R 4 , then BA maps [R 5 -> R 4 and so is a matrix with four rows 
and five columns whose entries are computed according to the usual rules of 
matrix multiplication. 


BA = 


V • 


•/ 


x = ( 2)(1) + (6)( — 1) 4- (10)(1) in our example. 

In particular, a linear map from (R p -> IR 1 = U (usually just called a linear function) 
is given by a matrix with one row and p columns. This is usually called a row 
-weetofr-T-hus— 


1 = 0,2, 3,4) 


is the linear map from 


such that 




b\ 


o 


-h 1 


2, etc. 


0 


0 




w 


Evaluated on any vector, we have 

M 


y 

\ w I 


= x + 2y + 3z + 4 w. 


So again, the value of the row vector 

, l = (a, b, c, d ) 

on the column vector 


V = 


Kwj 






is given by the usual rule of matrix multiplication - this time with just one entry: 


l(v) — ax + by + cz + dw. 


If .4:R P —»IR g a n d 


is again given by matrix 


multiplication: a 1 x q matrix times a q x p matrix. For example, if q = 3 and p = 5 
and 


1 ~ ( 1 * 2 , 3 ) 

r 5 2 10 1 

41 -103 

,3 0 110. 


A — 


then 


' 5 i 1 0 r 

A = (1,2,3)| 4 1 -1 0 3 | = (22,4,2,3,7). 

,3 0 110, 


One final bit of notational reminder from section 4.1. On the space R* we have 
the Euclidean scalar product ( , ) and associated norm || || given by 



5 . 1 . Big ‘oh’ and little ‘oh — 

I n the theory of the differential calculus of one variable, a function / is said to 
have a derivative /4 at a point x if / is defined in some neighborhood of x and the 
difference quotient, 

f{y + v)-f{x ) 

V 

defined for all sufficiently small v ^ 0, tends to the limit A as t>->0. We would like 
to generalize this definition to maps f:V-+W. Our first obstacle is that division 
by a vector makes no sense, so we cannot use the notion of a difference quotient. 
So we consider rather 

/(x + v) -fix) = Ay + (5.1) 


The condition that A be the derivative of / at x is that the error term </>(v) go to 
zero ‘faster than v\ We can give a precise meaning to the assertion in quotation 


marks by requiring that 


II Aify) II 


lim ~ 0 as llvll-> 0 . 

15.2) 

1 v | 










or, to be even more precise, this means that 

Giv e n any s > 0 t here exists a S > 0 such that 

|| <ft(v)» (5.3) 


for all v such that II v II sc 5. 


In (5.2) and (5.3), the exp r ession ] | v || denotes the length of the vector v in the space 
K and perhaps we should make this explicit by writing ||v|| F . Similarly, \\4>(\)\\ 
denotes the length of the vector </>(v) in the space W, so to emphasize this point, 
we might want to write \\<f>(v)\\ w - We would then write the first inequality in 
(5.3) as 


<£(v) II w ^ fill V || y. 


Since these subscripts would tend to clutter up the notation, we will not use them, 
but stick to the notation (5.2) and (5.3). 

For example, suppose that /, and hence (f >, is a map from U 2 to IR 1 . Suppose that 

we write the most general vector v in [R 2 as v = and, for typographical 

simplicity, write </>(v) as 4>(x, y). Then || v || = {x 2 + y 2 ) 112 and || $(v) || = \4>{x, y) |. In this 
case, condition 15.31 re ads:_ 

Given any e > 0 there exists a S > 0 such that 

- \<Kx,y)\ <e(x 2 + y 2 ) 1/2 - 

for all x and y such tha t_ 


(x 2 + y 2 ) 1/2 < 5. 

A function (f>:V~>W which is defined in some ball about th e origin and which 
satisfies (5.3) is said to be Tittle oh of v\ In symbols, we write ‘(/> is o(v)’ or, with some 
abuse of notation, = o(v). Thus we would write the condition that A is the 
derivativ e of / at x as 

/(x 4- v) — /(x) — Ay 4- (f)(\) where 0(v) is o(v) or ‘where 4 > (\) = o(v) ’ 


or, even more succinctly, as 

/(x + v) - /(x) = A v 4- o(v). (5.4) 

This last version is logically a bit sloppy but is the one that we will frequently use for 
convenience. The expression o(v) in (5.4) really stands for ‘some function </>(v) which is 
o(v)\ In many cases we are not interested in the error functions 4>, we just want to 
know that they satisfy (5.3). So it is convenient not to have to introduce a separate 
symbol for each function (f) that arises. 

To get some feeling for the concept of o(v), let us prove the following lemma: 


Suppose that (f):V^W is a linear transformation and that </>(v) = o(v). Then 
0 = 0 - ( 5 ^ 

Proof . Suppose that <ft(v) = By. Then <ft(rv ) = r cf)(v) for any real number r. For any 
£>0, choos e th e 3 so that || <ft(v) || ^ e || v || when ||v|| Now fo r any vector w, 



choose r = w 


, and write 


v' II = S so 


(w) || = r || 4>( w') || ^ re II w' II = srS = s II w I 


II0(w)|| ^e||w|| fora//w#0 

(and this is clearly true for w = 0 as well, since 0(0) = 0 if 0 is a linear map). But this 
inequality is to hold for all s. So <p = 0- 

From (5.5) it follows that if (5.4) holds, then the A occurring in (5.4) is uniquely 
determined. Indeed, suppose that 

/(x + v) — /(x) — A\ + 0 (v) 

and 

/(x + v) -/(x) - A'y + 0'(v) 
where both 0 and 0' are o(v). Then 

(A' - A)y = <^(v) - 0'(v). 

But, w e claim , th e sum or difference of two functi ons that are both o(v) is a gain o(v). 

_II 0 (v) II ^ h || v || for || v || ^ d l _ ._. 

and 

ii 0'(v) ii xteimr for tn^s 2 - 

Then choosing 3 to be the smaller of the two numbers 5 X and S 2 , we obtain, by the 
triangle inequality, 


A function / which satisfies (5.4) for some (and hence a unique) A is said to be 
differentiable at x. The unique linear transformation A is then called the differential of 
f at x and will be denoted by d/ x . To repeat, the differential of/ at x is the unique 
linear map from V to IF which approximates the actual change in/at x for small v in 
the sense that 

/(x + v) - /(x) = d/ x [v] + o(v). 

In order to prove the basic theorems about the differential calculus, we will need 
to assemble some facts about functions that are o(v), and for this it is convenient to 
introduce some more notation. 

A subset S of V is called a neighborhood of 0 if it contains some ball about the 
origin, i.e.. if. for some S > 0 , it co ntains the set of all v with II v II ^ <5. Clearly, the 




neighborhood of any point x. It will be a set which contains some ball about x, i.e. 

which cont a ins a set of the form {y| ||y - -x||«*} .- 

If A is an invertible linear transformation from V to W (so, in particular, V and W 
have the same dimension), then we can find constants k x and /c 2 > 0 so that 

II Av || ^feillvH 

and 

M _ 1 w|| ^ k 2 II w|| 

or, setting w = A\, 


*2 MMI < IMv||. 

Thus the image of any ball of radius r is contained in a ball of radius k x r and contains 
a ball of radius k 2 l r. In particular, A carries neighborhoods into neighborhoods as 
does A -1 . 



Figure 5.4 


Let us now return to the general case where V and W do not necessarily have 
the same dimension. We will let o{V, W) denote the space of all functions which 
are o(v). Thus, a function (f> belongs to o(V, W) if 0 = o(v)7 ln detail: =“ 

4>eo( V, W) if (j> is defined in s ome ne ighborh ood of the origi n and satis fies 
(5.3). 

We say that a function if/ is 0 (v) (read as ‘tj/ is big oh of v’) if ip is defined in some 
neighborhood of 0 and there is some constant k > 0 such that 

II <A(v) II < fc II v II 

for all v in this neighborhood. For example, any linear map is automatically 0(v). 
Also, clearly any function which is o(v) is certainly 0(v). We let 0(V , W) denote the 
space of all functions which are 0(v). Finally, we let I(V, W) denote the space of 
functions defined near 0 which tend to 0 as v -> 0. Thus 

/g/(F, W) if x is defined in some neighborhood of the origin and, for every 
s > 0 , there is a 8 > 0 such that 

|| x(v) || < e when || v || < d. 




If for example we take V= W= U 1 and define 



i J/eO(V,W) but t J/$o(V,W) 

and 

X eI(V,W) but xt<HV,W) 


so the above inclusions are strict. 

We have proved that the sum of two functions in o(V, W) is again in o{V, W). The 
same proof shows that the sum of two functions in 0(V, W) is in 0(V, W) and 
similarly for I(V , W). 

We now study the behavior of these spaces under composition. Let X be a third 
space. We will prove the following three useful facts: 

If 1 A 1 eO(F, W) and t f/ 2 eO(W,X), then i^ 2 o ^eO(K,I), (5.6) 

-Tf-freOiV, HQ and i]/ 2 eo(W,X), then ij/ 2 °\J/[eo(V,X), (5.7) 

If i / / 1 go(F, W) and t// 2 eO(IT, JT), then il/ 2 °ij/ x Eo{V,X). (5.8) 


Proof, If |liAi(v)|| ^ k t \\v\\ for ||v|| and ||i/' 2 (w)|| ^k 2 l|.w|| when ||w|| ^S 2 , then 
^ 2 0 » Ai will b e defined for || v || < 3 whe r e 3 is the smaller of the two numbers 
and S 2 /k 1 . For this range of v, we have 

11^2° l( v ) II - II l( V )) II < ^2 II 'I' l( v ) II ^ k 2 fc 1 II V ll 


provin g ( 5.6). I jjj/ 2 eo(W,X) we car 
(and hence <5) small. This ^ proves (5.7). If i// 1 eo(V, W), then we can choos e k x as 


small as we like by choosing (and hence 3) sufficiently small. This proves (5.8). 

If is a function from V to W and g: V-> IR a real-valued function, the product 
g(y)(j)(y) makes sense for any v that lies in the domain of both </> and g. So we can 
form the function gf which is a map from a subset of V to W. 

If iAeO(F, W) and gzI{V, R), then #eo(F, W). (5.9) 


Proof. We are told that there is a k such that || i l/(y) || < k || v || in some neighborhood 
of the origin. Given any s > 0, choose 3 so small that || g(y) || ^ e/k for all v with 
IM| ^ 3. Then, for such v, 

II 0(v)<A(v) II < (e/k) II <A(v) II < e || v || 
proving (5.9). Similar arguments prove 


If i Jte1(V, W) and geO(V,U), then gij/eo(V, W). (5.10) 




calculus. 

5.2. The differential calculus 

Let /: V-+ Wbe defined in some neighborhood of a point xeV. Define the function 
V / by the formula 

V x /(h) =/(x + h) —/(x). 

It is defined for all Ji in some neighborhood of 0 and measures the change in / 
relative to its value at x. The function / is continuous at x if V x /e/(F, W). (This 
means that V x /(h) tends to 0 as h-> 0, so/(x + h)->/(x).) Recall that the function 
/ is said to be differentiable at x if there is a linear transformation d/ x : V->W 
such that 


[quely determined by this equation and is called 
r function belongs to 0(V, W), and the sum of 
tion in o(V, W) lies in 0(V, W). From this we 


conclude that 


If f is d ifferentiable at x, then V/ x eO(K, W). ($.13) 

. * 1 • / -w r r r r\ r/r/ TT7\ 1 1 i 


u pai nv^Uldr, SinCC vv ) i ^ v , vv ^ wc luiiwiuuc i 

then it is certainly continuous at x.) If f is a linear function7/(x) = Ax, then 


IKlIIIIHtHIlIKIIJl 


tial is given by d/ x = A, independent of x. 

If / is a constant function, then V/ x = 0, and (5.4) holds with A — 0, so 

A constant function is differentiable everywhere and its differential is 
identically zero. 

We now state and prove the rule about the differential of a sum: 

If/ and g are two functions from V to W and both are differentiable at x, 
then so is their t sum and 

d(f + g) x = df x + dg x . (5.14) 

Proof. It is clear that V(/ + g\ = V/ x + Vff v . Since 


'x - Wx ^ <Pl 



and 

V< 7 X = dg x + 02 

where and f 2 are in o(V, W), we conclude that 

g) x d/ x h d^ x -f- (pi + (f)2- 
Since (^> t + 4> 2 )<=o(V . WO. this proves 15.14). 

We can multiply an [R-valued function g with a fF - valued function to get a 
Unvalued function. For this combination we can state the usual rule for the 
derivative of a product: 

Suppose that/: W and g: V IR are both differentiable at x. Then their 

product, gf, is also differentiable at x and 

d(gf)X h] = flf(x)d/ x [h] + (dflf x [h])/(x). 


Proof. 


V(g/) x [h] = 0 (x + h)/(x 4- h) - 0 (x)/(x) 

= fir(x + h)(/(x + h) -/(x)) + (gr(x + h) - g(x))/(x) 
= 0 (x)(/(x + h) -/(x)) + (gr(x + h) - 0 f(x))/(x) 

+ (fir(x + h) - g(x))(f(x + h) -/(x)) 


= g(x)V/ x 





= g(x)(d/ x [hj+ o(h)) + (dgr x [h] + o(h))/(x) + 0 (h)- 0 (h), 


since / and g are both differentiable a t x a nd hence both V/ x and Vg x are o(h) by 
(5.13). Now the product of two functions which are 0(h) is o(h) by (5.9). Both / 
and g are bounded near x since, in fact, gr(x + h ) — g(x) and /(x + h) —/(x) both 
tend to zero. The product of a bounded function and one which is o(h) is again 
o(h). Putting these facts info the last expression above gives 


V(g/) x [h] = (x)d/, [h] + (dg x [h] )/(x) + o(h) 


which was to be proved. 

We now come to the very important: 

Chain rule. Suppose that /: V->W is differentiable at xeF and that 
g: W-> X is differentiable at y =/(x)e W. Then g°f\ V-> X is differentiable at 
x and its differential is given by 

d(gof) x = (dg f[x) y(df x ). (5.15) 


(On the right-hand side of this equation we have the composition of two linear 
transformations, dg m : W-+X and df x :V-+W. On the left-hand side we 
have the composition of g and /.) 


Proof. 


v(W).rhi -- a/(x +h» - g (/wr 

= g(/(x) + V/ X [h]) - g(/(x)) 






= Vg»«,[V/«IAlJ 

-= dg /(x) [d/ x [h]] + dff /(x) [o[h]] + (4>°<A)(h), - 

wher e 4>eo(V,X) (coming from the error term in Vg /(x) ) and ij/ = V/ x eO(F, IF) by 
(5.13). By (5.8) this composite function is in o(V, X ) . Also d g m is li near , and hence i n 
0{W, X), and thus the second te r m is a composit e of an e l e m e nt in 0(W, X) with an 
element of o(V, W) and so is o{V,X ) by (5.7). Thus 

Vte°/) x [h] =(dg m odf x m + o(h) 

as was to be proved. 

Examples 

We now give some examples of differentials and the chain rule. For functions 
a: R 1 IK! 1 , the differential da x when evaluated on some heU is given by multi¬ 
plication by the derivative a'(x). Thus 

dotxlh ] = 0 L'(x)h. 

This is just the definition of the derivative a'(x). For example, let a: IR 1 -> [R 1 and 
R 1 -> R 1 be given by 

a(y) = y 2 , fi(x) = 5x 3 + 1 

so t hat 

cc°p(x) = (5x 3 + l) 2 . 

The n 

da y is multiplication by 2y, 

_ d f! x is multiplication by 15x 2 . _ 

d(at a fi) x is multiplication by 2(5x 3 + l)(15x 2 ) 


so 



and 


da.p (x) °dp x is multiplication by 15x 2 followed by multiplication by 
2(5x 3 + 1) or 

d(Xp (x) °dP x is multiplication by 2(5x 3 + 1) (15x 2 ) 
or 

da p{x) °dp x = d(a°P) x - the chain rule. 

It is clear that the notation here is cumbersome. Leibniz’s notation for functions of 
one variable is better: 

If a is a function of y write 

, da 

a = ^~ 
dy 

or rather 

da = a'dv. 




is last equation is taken to mean that at anv valueof 


= a'(y) 


d(y 2 ) = 2ycT) 


d(5x 3 4- 1) — 15x 2 dx. 

The chain rule now says substitute 

y = (5x 3 + l) 
dy = 15x 2 dx 

into the formula for d(y 2 ) to get the formula for d[(5x 3 + l) 2 ]. The chain rule 
becomes mechanical substitution in the Leibniz notation. 

We will continue to do some examples in our more cumbersome notation where, 
we hope, the meaning of the operations is clear. 

Let/: R 1 -> R 2 and g : R 2 -> R 1 be given by 




To e val uate d f x , we note that 


2 xs\—ns- 
2s / V 0 


= x 2 y. 


Is + 0 ( 5 ) 


so that d f x is represented by the matrix 


Similarly, 


(1 = (x + s) 2 {y + t) - x 2 y 


= x 2 t + 2 sxy + 2 sxt + s 2 y + s 2 t 


= (2xy,x)[ +o | 


so that &g , x , is the matrix 

dg^ = (2xy, x 2 ). 

The composite function g°f: R 1 -*■ R 1 is given by 


,2 , 1 \ 2 / 



so that 


d( gpf ) x = 2( x 2 + l )(2x)(2 x - 1) + 2(x 2 + l) 2 . 


The chain rule says this must equal the matrix product dg f(xj °df x which is given by 


( 2x' 


- P(x 2 + l)(2x - 1), (x 2 + I) 2 ) 


VZ 


= 2(x 2 + l)(2x — l)(2x) + 2(x 2 + ly 


which equals d(g°f) x . 

We can also form the composite function f°g: R 2 - 

/ (x 2 y) 2 + 1 


given by 


f°9\ 


x 

J. 


2 (x 2 y) — 1 


To compute d(/ 0 g)^ X y we expand 




/(x + s) 4 (y + £) 2 + 1 \ / x 4 y 2 + 1 \ 

\2(x + s) 2 (y + t) — 1 / \2x 2 y — 1/ 

/ (x 4 + 4x 3 s + 6x 2 s 2 + 4xs 3 + s 4 )(y 2 + 2 yt + £ 2 ) — x 4 y 2 ' 
\ 2(x 2 + 2xs + s 2 )(y + t) — 2x 2 y 

/2x 4 y£ + 4x 3 y 2 s\ 


\ 2 x 2 t+ 4xys 


A *2 




4 x 3 s(2yt + t 2 ) 4- (6x ^5^- 
4xst + s 2 y + s 2 t 


3 h s 4 )(y -| t) 2 \ 


J 


-1 

( 4x 3 y 2 —2x 4 y^ 


1 y- o\ 


(A 

l| 

i 

1 

V 4xy 2x 2 ) 



0 


h 

i 


so that 


4x 3 y 2 2x 4 y' 


d(/°0)m = 


(J)— V 4xy 2x^ 


The chain rule says that this must equal df ,, X \\°dg, X s which is given by 

/2x 2 y\ 

d/ »(e)) 0< \r(, 2 r xy ’ x) 

/4x 3 y 2 2x 4 y\ 

\ 4xy 2x 2 / 

which equals d(f°g) (xy 

As another example of the chain rule, let F: R 2 -> R 2 and G: R 2 -* R 2 be given by 


' 2 i N 

x +y 


'3xy 2 


xy 






so that 


9x 2 y 2 


6x 2 y 


}y the 


11 s must equal 


iven by 


af oAr _( 2 ( 3x y 2 ) 1 V 3j/2 6xy \ 

G ((y)) dG (y) \ x 2 3xy 2 )\2x 0 J 

/ 18xy 4 +2x 36x 2 y 3 \ 

\3x 2 y 2 +6x 2 y 2 6x 3 y J 

= d (F°G) ( jc). 

In the next few sections we will spend some time extracting important conse¬ 
quences of the chain rule. 

We first give some more ‘abstract’ examples of the chain rule and introduce some 
notation. 


Let us consider the multiplication map g: SR ; 



= xy. 


Ifv = 


and h - 


then 


= g(\) + xs + yr + o(h) 
so 

d v ^(h) = xs + yr, 

and its matrix (with one row and two columns) is 

(l4 

Let /: IR 1 -> R 2 be given. We can think of / as describing a curve in the plane, or, 
more simply, as giving a pair of real-valued functions of one real variable, 

^ ( x (t)\ 


Then 


fit + h) = 


'x(t + h)\ fx(t) + x'(t)h + o(h)' 


(t + h) 


ft) 4- v'(t)h + o(h) 





gives 


%°A = (g°f)'(t) = X'(t)y(t) + x(t)y'(t). 

But (g°f)(t) = x(t)y(t). Thus the chain rule implies Leibniz’s formula for the 
derivative of the product of two functions. 

Before proceeding, it will be convenient to introduce and explain some further 
notation. Instead of writing 


dg v (h) — yr + xs where v = 


x 


and g[ 


= xy, 


it is more convenient to write all of this information as 

d(xy) — ydx + xdy. 


In this equation, the symbol dx occurring on the right-hand side is understood 
as a linear map from R 2 -»(R 1 : the map which assigns to each vector its first 
_ coordinate. Thus _ 


__ '(• 1 . l 

E 

1 _ 

d-x^iiy / it fi — | 

w 

1 ’ 


and simila r ly, 

dy(h) =Y. 


In the expression ydx, the y is a function of v, that function which assigns to v its 
second coordinate, where v = ( * ). So the terms like ydx really depend on two 

\yj 


kinds of variables, the variable v which tells us where we are computing the deriva¬ 
tive and the h which is the measure of the small displacement. The d(xy) that occurs 
on the left-hand side is a shorthand way of writing ‘dg [¥] where g is that function 

defined by gr(v) = xy when v = ( X V In applying the chain rule as in the above 


example, we would say 

Consider x as the function* on IR 2 which assigns to each vector its first coordi¬ 
nate. Then (x°f){t) = x(t) by the definition of the map /. By the chain rule. 


* It might b e instructive here to reread the lengthy d i scussion in section 3. 3 where we discuss 
how a coordinate, such as x , is to be viewed as a funct i on .- 





d(*°/)t = x'{t)dt, where, in this equation, x'(t) is a function evaluated at the point t 
where we are computing the deriv a tive, and d t is the p a rt which measures the 
small increment. So when we think of x as a function of t giv e n to us by the map/, 
we make the ‘substitution’ dx = x'dt where now x' is a function of t. Similarly, the 
chain rule tells us that if we consider y as a function of t given to us by the map f~ 
then we must ‘substitute’d y = y'dt. _ 

We would then write 

d(g°f) = yx'dt + xy'dt 

with x, y, x' and / substituted on the right-hand side as explicit functions of t. 

For example, suppose x(t) = t + sin t,y{t) = e 2t . Then we would write 

dg = d(xy) = ydx 4- xdy, 

., / £ 4- sin t \ /I 4-cost\ 

d/=d l e 2 ' J = 1 2e 2 ' r 

and 

d(flf°/) = d((t + sint)(e 2 ')) = (e 2r (l + cost) 4- 2e 2, (t + sinf))dt. 

Let us state the chain rule once more in diagrammatic form: We are given two 
differentiable maps f:V-+W an d g: W-+ Z, so we can form their composite 
g°f: V >Z. At some point v in F we can apply / to get to/(v) and then g to get 
to g(f {\ )). In computing d(g q /) v (h ) we can foll ow the maps along, by first applying 
d f y to h and then d g f{y) to the image._ 





Let us now ^ do som e slightly mor e sophi sti cat e d computations with th e chain 
rule. In these computations we will take V, W etc. to be higher-dimensional vector 
spaces, so the logical purist might want to postpone studying them until after 
reading the chapter on linear algebra. Nevertheless, we recommend having a look 
at them here. We begin with a computation of the derivative of a product of two 
matrices. Let V be the vector space consisting of pairs of« x n matrices, so a typical 
vector in V is of the form 



where A and B are n x n matrices. (This becomes a vector space by componentwise 
addition and scalar multiplication: 


/A\ 


/ ^>\ 


/ 4 4- 4'\ f aA\ 

ifv= t; 

1 and v' = 

i ~~ 

then v + v' = j 

n and a\ = • 


1 - 

\B') 


\B + B' J \aBJ 



This obviously makes V into a vector space of dimension 2n 2 .) Let W denote the 

(A\ 


vector space of all n x n matrices, and define the map g: V-* W by g 


= AB. 


w 


Vr -( A 

1 and h — 1 

43 

1 then 

_ \B 


L Yl 



g(\ + h) - g(\) = {A + X)(B + Y) - AB = XB + AY + X Y = XB + AY + o(h) 


so 

dg v (h) = XB + AY. (5.16) 

In doing computations, we might want to use our more convenient notation which 
drops the subscript v and the values at a particular h. We could write (5.16) as 

d (AB) = (d A)B + AdB. (5.17) 

In this notation, the AB occurring on the left is a sloppy but convenient way of 
writing the function g. The d A occurring on the right is the derivative of the function 

which assigns to the matrix A. This derivative when evaluated at the point 
°n the vector ^ 'j yields the value X. Thus d A is the linear map which 


assigns to each 


the value X. So, 


(X\ , 

assigns to I 1 the 


writing — 


454 ^ 


As another example of this notation, let/ denote the map from W to V given by 


u\ 




W 


Since / is linear, w e know that its d e rivativ e is ind e p e nd e nt of A and is^j ust th e 
same map again, evaluated on vectors, i.e. 


d f A (X) = 


X 

X 


In the differential notation we would write this as 

(where again, d A is the linear function which assigns the value X to any element 
X). Now let us consider the map h of W -> W defined by 

h(A) = A 2 . 

We clearly have h(A) = g(f(A)) or h — g°f. So the chain rule applies: 

• A 


tf Nfr 


XA+AX 


Figure 5.6 



it says: 




d h A (Z) = d(g°f) A (Z) = dg /m)( df A (Z)) = dg /M) 


W 


= ZA 4- AZ. 


W e would write this computation in the ‘differential notation’ as follows: Make 
the ^substitutions’ A = A and B = A in (5.17) to obtain 

d (A 2 ) = (dA)A + A(dA). 


(Notice once again, that on account of the non-commutative nature of matrix multi¬ 
plication this is the correct generalization of the formula d(x 2 ) = 2xdx of functions 
of one variable. It is not true that d(v4 2 ) = 2 Ad A.) 

The Born expansion 

Let us now consider the map (inv) which assigns to each invertible matrix its 
inverse, so 

(inv)(yl) = y4- 1 . 

The map (inv) is not defined on all of W, but only on that subset of W consisting 
of all matrices which are invertible. Assuming that inv is differentiable where defined, 
we shall show how to compute the derivative of the map (inv) usi ng th e chain 


rule: Define the map / by 




A 

A ' 1 


or, more symbolically, 




m 


(inv)/ 


Recall that g is the map defined by 


9 \ 


= AB. 


B 


Then (g°f)(A) = AA 1 =/ where / is the unit matrix. In other words, g°f is a 
constant, and hence d(g°f ) = 0. By the chain rule, 

df m = ( d ^ (id)(X) \ = ( X \ 

JA } \d A (inv)(X)j U(invU*)/ 
and, by the chain rule again, 

0 = ld /lAJ 0l{d A f{X)) = XA~ l + ^(d^(inv)W). 

Multiplying this equation on the left by A~ l and solving for d jl (inv)(X) gives 

d A (mv){X)= -A~ l XA~ 1 . 


A A 1 = /. we know that d (AA = 0. ‘Substitutins’ A and A 1 for A and B in 


\B 1 ,l(dB) gi\Co 

me lormuia —j 


0 = 

= d (AA ~ 1 ) = (d A) A ~ 1 + ,4dL4 “ M 





and solving this equation for d{A *) gives the formula 

d(A~ 1 ) = ^A~ l (dA)A-\ (5-18) 

(This is th e correct generalization to matrices of the formula d(l/x) = — (l/x 2 )djc 
of one-variable calculus.) We pause to give a slightly different explanation ofThe 
precedin g formula. Suppose that A is an invertible matrix, i.e. that Pet A ^ 0. Then 
if X is a matrix whose entries are sufficiently small, Det(^4 + X) # 0 so that A + X 
is also invertible. We can write 

A + X = (I + XA~ 1 )A. 

If X is sufficiently small the matrix XA~ x will also be small and the series 
(I + XA- 1 )- 1 =I-(XA~ 1 ) + (XA- 1 ) 2 -(XA- 1 ) 3 + ••• 


will converge. Then we have 

(A + X)- 1 = [(I + XA~ 1 )AY 1 =A~ 1 (I + XA- 1 )- 1 

= A- l (I-{XA- 1 ) + (XA~ 1 ) 2 - 
or 

(A + X)~ l = A~ l — A~ 1 XA~ 1 + A~ 1 XA~ i XA~ 1 
— A~ 1 XA~ 1 XA~ 1 XA~ 1 + 


In the 

theoretical physicist Max Born. The formula (5.18) follows from the Born expansion 
wh en we drop all terms which ar e of higher or der in X. In the physics literature 
the approximation given by (5.18) is known as thefirst Born approximation. It is 
of basic importance in scattering theory. As we have seen, we did not have to know 
the entire Born expansion in order to derive the first Born approximation; we got 
it straight from the chain rule. _ 


On the other hand, a moment’s reflection shows that the Born expansion implies 


that 


(A + Xy l - A- 1 = - A' x XA- 1 +o{X). 


This proves that the function (inv) is differentiable - a fact that we had to assume 
in applying the chain rule. 

Let B be a constant matrix, and consider the map f(A) = ABA~ 1 . Then 

d(ABA ~ 1 ) = (< dA)BA ~ 1 + AB(dA ~ 1 ) 

= (dA)BA ~ 1 - ABA ~ l {dA)A ~ *. 

In other words, 

d A f (X) = XBA ~ 1 — ABA ~ x XA ~ 1 . 

Suppose that t^A(t) is some differentiable curve of matrices, and let 

C(t) = A(t)BA(t)~ 1 

where B is a constant matrix and we assume th at A(t) is invertible for all t. Applying 


th e chain rule and the preceding formula w e see that 


C’(t) = A’WBAjt y 1 - AtyBAMA'MAj t )- 1 . 



Suppose that T(0) = / and T'(0) = X. Then setting t — 0 into the preceding formula 
gives 

<7(0) = XB-BX. 


This formula is one of the most basic in math e matics and physics. The right-hand 
side of this formula is called the commutator of X and B and is denoted by \X] B\ 
so 

[_X,B~\= XB-BX. 

For example, suppose that A(t) = exp tX so 

A{t) = I + tX +±t 2 X 2 + •••. 

Then clearly T(0) = / and d'(0) = X so the above formula applies. Let us verify it 
directly. We have A{t)~ x =(exp tX) -1 =exp (—tX) = I — tX -\-\t 2 X 2 + •••so 

A(t)BA(t)~ l = (I + tX +±t 2 X 2 + • •)£(/ - tX+±t 2 X 2 - •••) 

= B + t(XB - BX) + i t 2 {X 2 B - 2XBX + BX 2 ) +■■■. 

Collecting the terms which are of degree two or higher in t gives 


A{t)BA{t)~ 1 = B + t[X, B] + o(t). 



Kepler motion 

We have seen that j 

the chain-rule impliei 
the product of mati 

s Leibniz’s rule for the derivative of a 

prodttet - evenrTbr 

hces where the multiplication is not 

commutative. We now want to apply this same reasoning to the so-called vector 

product in [R 3 . (We w 
we will derive Keple 

ill remind you of its definition in a moment.) As a consequence. 

lo bCLunu rct\V i(Ji praiiCtary illOllOIl. 

In three-dimensional space there is a vector product denned as tollows: 

/v\ 

/n \ /VI* — 7.fl\ 

T C 1 

r 



' J' \ 


II V = 

y and w — 

Li 

1 men v x w — 

zp xr 


w 

w 

\xq - yp/ 


It follows immediately from the definition that 

(V t +V 2 ) X W = Vi X W + V 2 x W, V X (w x + W 2 ) = V X Wj + v x w 2 
(av) x w = v x (aw) = a(v x w) 

and 

v x v = 0. 

It follows from the first three equations that x acts like a multiplication and hence 
that 


d(v xw) = dvxw + vx dw. 

In particular, if v(£) and w(£) are curves in (R 3 and if we set 







I2IL3B »/* 9 ■ ■ in [• 




Suppose tnat tne particle nas mass m and that it is subject to a iorce t\t) pointing 
along t he line from the origin to the particle, so that F(t) = c(t)r(t). Then 

r'(0 = (l/m)p(t) and p'(t) = F(t) = c(t)r{t) 

and hence 

/i’(t) = p'(t) x r(t) + p(t) x r'(£) 

= c(£)r(£) x r(£) + (l/m)p(£) x p(£) = 0. 

In other words, /i must be a constant. This law is known as the conservation of 
angular momentum. Let us suppose (for simplicity) that fi # 0. It follows easily from 
the definition of vector multiplication that for any vectors v and w we always have 
(v x w)-w = 0. Since = p(f) x r(£) we conclude that /i*r(t) = 0 for all t. In other 
words the particle always moves in a fixed plane, the plane perpendicular to ji. Let us 
rotate our coordinate system in R 3 so that fi lies along the z-axis, and hence the 
partic le lies in the xv-plane. T hus 

Mt}\ fm\ ( ° \ 

r (0 = ( y(t) and therefore p(t) = m y(t) 1 and /i = m 0 ). 

- \Q / -—- \ 0 / - \x’{t)y(t) ~ x(t) y'(t)J - 

T hus the con ditio n that /1 be constant i m pl ies th a t the expre s sio n x'( t )y(t) — x(tW(t) 
is constant. To und e rstand th e m e aning of this condition, let us draw the trajectory 
of the particle in the xy-plane. Up to terms which are o(h), the area bounded by the 
. ( , . . . , " f'x(t + h)\. .. “ .... 


vector I 


Tthe trajectory, and the vector | 


1 is the same as the area of the 


Vi. xMj.Ah.Av —v/ 7 —HIV VTT v rvwnjiu 1 — r ■ UI 1 U ■ _-- r 1 • iiv. w vuii i i. v 

\ yjt) J \ y (t + h) J 

hatched region in figure 5.7. The area of the triangle is (up to sign) given by 

2 {x{t + h)y(t) - x(t)y(t + h)). 


1 (t+h) 


igure 5. 



But 

x(t + h) = x(t) + hx'(t) + o(ttr 

and_ _ 

y(t + h) = y(t) + hy'(t ) 4- o(h) 

so the ar ea of the triangle is given by 

i(x'(t)y(t) - x(t)y'(t))h + o(h). 

We conclude that the rate at which ‘area is swept out by the radius vector’ is a 
constant, Kepler’s second law. Thus, by use of the chain rule, we see that Kepler’s 
second law, and the fact that the particle moves in a fixed plane, follow whenever 
there is a central force law. The fact that the planets move in a fixed plane and 
sweep out equal areas in equal times is a consequence of the fact that their motion 
is determined by a force directed toward the sun. The preceding derivation of 
Kepler’s second law is due to Newton. 


5.4. Partial derivatives and differential forms 

In this section we will introduce some concepts and some notation that 
convenient for the chain rule. Let us consider a differentiable function /: U k - 
For e xampl e, take k = 3 and sup pose that 


are 



Then d f„ is a linear map from R 3 ->U. So d/ v can be represented as a row vector. We 

Oc' 

claim that , at any p oint v = | y the row vecto r is given by 


d/ v = (2xjrz , 3 x 2 y 2 z 4 , 4x 2 y 3 z 3 ). 




0 


To check this, we need only to evaluate on each of the vectors ( 0 I, I 1 ] and 

,0/ Vo, 

For example, 

f(^ + ~/(v) = d/ v |Vj + o(s) 

= sd/ v + o(s) 

by the definition of d/ ¥ and the fact that d/ v [h] is linear in h. Now 


0 |. 
X 




1 


//x + S s 




X 


/ V + 5 0 -f V - 


l 


-=£ 


Jhh 








is a more convenient way or organizing tnis miormation. Kecan tnat we nave written 


dx for the linear fu nction which assigns to each vector i ts first component. Thus 

dx = (l,0, 0) 

and similarly 

dy = (0,1,0), dz = (0,0,1). 

Then we can write the equation 

d/ - = (f (v)> | (v) - f w ) = I (T)(I > °’ 0) + I (T)(0 ’ ’’ 0) + I (z)(0 - °- '> 


as 


Thus 


df , 5 /. 5 /, 


d(* 2 j; 3 z 4 ) = 2xy 3 z 4 dx + 3x 2 y 2 z 4 dy + 4x 2 y 3 z 3 dz 


The expression on the right is a sum of three terms, each a function times a dx 



or a dy or a dz. Such a sum is called a linear differential form. Its meaning is that 
it is a rule which assigns to r.nrh nnint of IR 3 a row vector. 


s-'-1- f-° 

to get 

d/ = dx. 

With this notation, the chain rule reduces to substitution. Let us illustrate what we 
mean. Consider the map (f>: [R 2 -> U 2 given by 

Let be some function, say 

f( ( X \) = x 3 + y 2 x. 


Then 


The map d4> (r \ will be 


= r 3 cos 9. 


d ^(» - 


row vector (1,0) so 


a,0)(: b d )=(a,b). 


Now (1,0) is just dx. The chain rule says that 

d W d *«T d(xo % 


But = r cos ^ 5 so 

d(x° = cos 9 dr — r sin 9 d9 

= (cos 9, — r sin 9) 

as a row vector. So a = cos 9, b = — r sin 9. Similarly 



fa b\ 


(c,d) = ((),!){ 

i r_ A 

= = d (y°(p) ( r} = sin 9 dr + r cost/at' 


= (sin 9.r cos 9). 






is the matrix 


r cos 9 )' 


Now 


2 , „2'i 


d(/° 0) = 3r 2 cos 0 dr — r 3 sin 9 d 9. 
In principle, the chain rule says 


(3 r 2 cos 2 6 + r 2 sin 2 9, 2 r 2 sin 9 cos 9) , 


cos# —r sin# 


sin 9 r cos 


^ = (3r 2 cos #, — r 3 sin 6). 


This is, of course, correct. But in effect, the chain rule says substitute 

x = r cos 6 , dx = cos 9 dr — r sin 6 d #, 

y = r sin 9, dy = sin 0 dr + rcos#d# 

into the expression 

d1/ = (3x 2 + y 2 )dy 4- (2xy)dy 
then multiply, collect coefficients and you will get 

In other words, think of x as a fu nction of r and 9, which it becomes by the 

map (p, i.e., x is replaced by the function x°4> = r cos 9, and then take d of this 

function. _ 

In doing these computations it is convenient to remember that 

d(gh) = gdh + hdg. 

(Here, for example, in M 2 


hdg = h^-dx + h^dy. 
ox oy 


Then 


d Ugh) 0 0] = (g ° (f))d(h 0 </>) + (h ° </>)d (g ° <f>). 
Thus, in our example 


with 


Thus 


F\ X ) = x 3 + y 2 x = (x 2 + y 2 )x — gh\ 


g — x 2 + y 2 and h = x. 


o 0 = r -rcosi 


r cos 0 dr — r sin 





This procedure is completely general: let 
n U l - so a typical noint r>f P*; c 


denote the coordinate functions^ 


LetTT 


)e a differentiable function. Then 


df = ~^d yi + ---+j^d yi . 

d yi d yi 


Suppose that x l ,...,x k are coordinates on U k . Let (p: lR k -> U l be a differentiable map. 
Define cp 1 = y l °(p i <f> 2 = y2°<l>> etc., so 

(<P i (v)\ A A 

</>(v) = I ; | where v = [ • j. 

\0zOO / \ V 

Then 


d(p x =-z —ax i + ••• + -—dx k 


ux k 

and the linear map d<p y is given by the matrix 


d<t> x 

d(px 

dxx 

dx k 

d(p = ; 


dcPi 


\dxx 

dx k 


The chain rule says that 


d(/° 4>)=-^-°4>d(j) 1 + ••• + ~°(pd(p l 
oy i d yi 


where the expressions dcpx =-^ L dx 1 + ••• +-^ L dx k are used in this formula. 


existence of the partial derivatives with respect to x and with respec 



not necessarily imply the differentiability of / at p as can be shown by some 
patholo g ical examples. Sufficient c onditions for the differentiability o f / a t p are 
given by the following theorem. 


The orem. Let f:M 2 — >IR 1 have continuous partial derivatives -z— and — at n. 

ox dy 

Then f is differentiable at p. 


Proof. If/ is differentiable at p, then the linear map d1/ must be given by 


d/ D 


= s- 


df 


df 


+ t ^ 
p dy 


T , c df 

Thus, if — 

ox 


df 


and 

p dy 


\\tJJ dx | 

exist, then / is differentiable at p if and only if 


V/, 


p\ i t 


= s 


df 

dx 


df 


+1 ~ 
p dy 


+ o(U 2 , U 1 ). 


Letting p = ^ Xp j, we can expand V/ p ^^J j as 


V/p 7 I =/(x p + S, y p + t) —/(x p , y p ) 



/i x p t Mp i L ) J ' l ) ' j vx p , y p r i) J{x p ,y p ). 

m . . _ df 

_ df\ 


The continuity of — 

and — 

implies by the mean-value theorem of 

UfC 

P ~dy] 

p 

one-variable calculus that 





J fx p 't- s, -t-t ) j tx p , y p -t- 1) — s — 

{ x ° 1 


\y, + tJ 

_r/-._.._■ /v ,.\ t d j 


J( x p^y p + L ) /(-VJy l p ] 

( X A 


o y 


for some x 0 and y 0 satisfying x < x 0 < x + s and y < y 0 < y p + t. Therefore 


v/„ 


= s 


df 

dx 


/ 


so that 


•*0 

\y f +t 


+ t 


df 

dy 


(’ 

Vy 0 . 


V/„ 


= s 


s\\_ df 
f)) S dx 

f df' 

(dx i( 

' l y*+t 


t 


df 

p 

_df 

dx (x 


mi 1 dy 1 x 


df 

dy 


V >v 



( s\ 


As 


tends to zero, the coefficients of s and t each tend to zero so that these 


w 












d_df = d_d£ 
dx dy dy dx 


In order to make this argument work, we just need to take care in examining the 
error terms. We can do this by appealing to the mean value theorem in the calculus 
of one variable. Set 


g(y) =f 




The function g is differentiable in y and our sum (5.20) is just 


g(y + t)~ g(y). 


By the mean-value theorem 


9(y + t) — g(y)=-tg'{y)- 


where v is some point between y and y 4- t. But 


1 


g'(y) = Yim-{g{y + e)-g{y)) 


E—>0 g 


1 ( J(x + s\\ — J/x + sW — (-J- 


lirn^tT 


^7 


~ f 




zz 




E - >0 g 


\y + £ JJ 


V P ii 
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By assumption, the function df/dy is differentiable. So applying the mean-value 
theorem once more we get 


d / / x + s' 

dy J \\ y . 


dy\\y/) Sdx \ d yJ\\y / 


d_(df S 


Thus the sum given by (5.20) equals 


5 5 ffx 

St dxdy \\y j 


By assumption, the function —( — ] is continuous. So for any £ > 0 we can find a 

dx\dyj 




(5 > 0 such that 


d d 


d 8 




dx dy \\yJJ dxdy 


A 


< p, _ if I si + I t l < S. 


Thus (5.20) becomes 






where |rj | < e if |s| + \ t\ < 5. Similarly, (5.19) is 

d df f f x\\ 


= sr 


+ r- 


dy dx\\y 

Assume \st\ >0. Dividing by st we see that 

dx dy\\y)) dydx\\y / 

where 


< r. 


k 3 1 <kil + kal <2 if |s| + |t| <<5. 

Since the left-hand side of this inequality does not depend on <5 - it is just the 


difference between two numbers - and r 3 can be made as well as we like, we conclude 



f. AM _, ddf\ 


the equality of the crossed derivatives | 

LQ -’irir and 
\—dx dy dy dx ) 

• 


- 53 : 


derivatives 


interval containing 0 in R 1 and let y: / -> K be differentiable at 0. (As 
usual , Kean be any of our choice s of vector spaces , but let us visuali ze the case where 
V = R 2 .) Suppo se th at y(0) = x. We wi ll use the nota tion y'(0) to denote t h e vecto r 


dy 0 (l) so that 


/(0) = dy 0 (l) - lim [(l/t)(y(t) - y(0))]. 


The vector y'(0) is called the tangent vector to the curve y at t = 0. If y x is a second 
curve with y^O) = y(0) and y'^O) = y'(0), then we say that y and y t are tangent at 0, or 
agree to first order at 0. If y is tangent to y x at zero and y 1 is tangent to y 2 at zero, then 







clearly y is tangent to y 2 at zero. In other words, we have defined an equivalence 
relation on differentiable curves; two curves a r e equivalent if they agree to first order 
at zero. If y'(O) - v, then the pair {x, v} d e t e rmine the equival e nce class. We visualize 
this equivalence class as a (little) vector v whose tail starts at x, and we call it a tangent 
vector at x. Any x and v comes from an equivalence class, because we can always 
consider the straight line curve 


y{t) = x -i- 1 \ 

which satisfies y(0) = x and y'(0) = v. We will sometimes use a single Greek letter such 
as £ for a tangent vector at x. So % specifies both x and v. 

Suppose that V= R 2 . The curve y is then specified by giving the two functions x°y 
and y°y, usually written as x(t) and y(t). Thus, for example 

x(t) = t sin t + 1 

At) = e f 


specifies the curve 


/N /tsint+l\ 
e' j 


with 




and 



Notice that 


d(x°y) = d(t s in T + 1) =4si n t + f cos t)dt 


d(y°y) = d(e r ) = e'dt 


so that the first and second coordinates of 


/ sin t + t cos t 

y'W= e , 


can be recovered as the coefficients of d t in d(x°y) and d(y°y). 

Let /: VU be a function defined in some neighborhood of p. For each curve y 
with y(0) = p, the function f°y is defined near 0 in U. If / is differentiable at p and 
y is differentiable at 0, then, by the chain rule, f°y is a (real-valued) function which is 
differentiable at 0 and its derivative is given by 

(/°y)(0) = d/ p (y'(0)) 

according to the chain rule. 

In terms of our differential form notation in R 2 , we would substitute d(x°y) for dx, 
d(y°y) for dy and 8f /dx, df fdy for of /dx and df /dy in the expressio n for d/ . T h us, in 
our p r eceding example, if we took 





-f 




= x 2 + y 2 , 


d/= 2xdx + 2y dy, 


d(/°y) = 2(t sin f + l)(sin t + t cos r)dt + 2e f -e f df 


= 2(t sin t + l)(sin £ + t cos t) + 2e 2t dr. 
Th e coefficient of dt is (/°y)'(t). Setting t — 0 gives 0). 


Notice that (/°y)'(0) depends on p and y'(0) but on no further information about 
the curve y. In short, it depends on the tangent vector . We shall write this value as 
D /. We call Dthe directional derivative of / with respect to <* Thus 

D|/ = d/ p (v) if£={p,v}. 

For example, if v = then D 4 f = ^-(p). Let f x and f 2 be two functions which are 

differentiable at x, and let f=f x +f 2 ■ Let y be a curve passing through x whose 
tangent vector at p is From the calculus of functions of one variable we know that 

(/°y)'(0) = (/i°y)'(0) + {f 2 °y)'(0) 

and so we conclude that 


Df(/i +./?) = Df/i +D f A. 


Similarly, if we set h =/!/ 2 , we know from elementary calculus that 


(h°y)'(0) = (/i°y)'(0)(/ 2 °y)(0) + CA°y)(P)(/i 0 yr(P) 

= (/i°y)'(0)/ 2 (x)+/ 1 (x)(/ 2 °y)'(0)r 


since (/ 1 °y)(0) —/i(y(0)) — / t (x) a nd similarly - for / 2 , Thus we can write 


D t (/ 1 / 2 ) = (D e /i)/ 2 +/iD { / 2 . 


Another examp l e of th e directional derivative foll o ws. Let y: IR -> IR 2 with 


yit) = [7+ 2 ( ‘+2 ’ A 


Then d/m and £ = (y(0), y'(0)} are given by 

^(y) = ^ + ^ = {l 


= x 2 y + y 3 . 


DC’ 


so that 


D { (/) = d/(-i)(Q)) = (-4,13)Q) = -4 + 26 = 22. 

To verify that this equals (/°y')(0), we note that 

f°y(t) — (r — l) 2 (t 2 + 2t + 2) 4- (t 2 + 2 1 + 2) 3 , 

( f°v)’(t) = 2(t - lift 2 + 2t + 2) + ft - l) 2 (2t + 2) 

+ 3(t 2 + 2t + 2) 2 (2t + 2) 






so that 


(/°y)'(0) ~ 2(— 1)(2) +( - 1)2(2) + 3(2) 2 (2) = 22. 

As an example of the formu la for the directional derivative of a product, let 




g: 1R 2 -»R be given by g 


= * 2 — y 2 - Then the product mapping fg: 




is 


fg\ 


=f 


m 


jjj \\yjj \\y, 

The differentials d g and d(fg) are given by 


1 = {x 2 y + y 3 )(x 2 - y 2 ). 


dg^ = (2x, —2y) 

d (M( x ) = (2*y(x 2 - y 2 ) + (. x 2 y + y 3 )(2x), (x 2 + 3y 2 )(x 2 - y 2 ) 
+ {x 2 y + y 3 ){-2y)). 

We then have 


D { (/ S ) = d(/ 9 ) r . ) (Q)) 

= ( —4< —3)+10(-2), 13(-3) + 10(-4))Qj 


= -8-158=^166. 


By the product formula, D F (fg) must also be given by 


D (fa) — D ( Dal ~ 1 + f( lDlol 

±J $\J )y\ 2J^\ 2 


with 


g[ , = (-1)^ 2 = - 3, /1 „ ) = ) 2 2 -T2 3 = 10 

\ 2/ \ 2/ 

//1\\ /1\ 

D g (g) = dg(-i) = (~ 2, -4) ' ) = -10 

— ^ 1 2 } \V/J 

so that 


V ' ifg ) = 22( - 3) + 10( -10) = - 166 

which agrees with the previous calculation. It will be convenient for us to think of the 
set of all tangent vectors at x as constituting a vector space, called the tangent space 
at x and denoted by TV X . Thus, if | = {x, v} and rf = {x, w} are two tangent vectors at 
x, then their sum is defined as £ + i] = {x, v + w}. Similarly, if £ = {x, v} and a is any 
real number, then a% = {x, av}. In short, TV X looks just like V except that it has the 
extra dummy label x attached to everything. At present this seems like a 
cumbersome piece of excess notational baggage, but its value will become clear later. 

If | = (x, v} and rj = (x, w}, then 






so 


— 



D r ,,f=D r f + DJ. 





Similarly. 



rx /_ T ~p\ / 



— aU^j. 



5.6. The pullback notation 



Let 0: V-+ W be a differentiable function with 0(x) = y. If/: [R is a function 

defined near y, then /° 0 is a function defined near x. In order to emphasize a point of 
view which will be central in this book, we will denote this function by 0* f and call it 
the pullback of / under 0. So 

0*/ = /°0. 

(</>*/)(x) =/(</>(x)). 

We think of 0 as fixed and/as varying, so that 0* pulls all functions on ILback to K 
Notice that 


<fr*(/i +/ 2 ) = 0*/i + 4>*fi 



We should pause to explain our point of view about these equations. We have 
an r0-plane and an xy-plane. We are thinking of 0 as the map which assigns to each 
point of the r0-plane a point in the xy-plane. We are considering x as a function on 
the xy-plane: that function which assigns to each point its x-coordinate. Then 0*x 


e y 















Then, if x,y,z denote the three coordinates on R, then 


"x = r so 


>*dv = sdr 4- rds. 


pnr any function /nn B 3 

d/= ^ dx+ | dy+ f dz 

and we can compute </>*(d/) in either of two ways; either as 

d (</>*/) 


or as 


0W) = + ^*(^V* dy + ^*(%^* dz - 


For example, suppose that 


f\\y =y-xz 


Then 


df = —zdx + 2ydy — xdz. 


_0*£=XL 


r 3 f= d((p*f) = 0. 

Computing (/>*d/di rectly, we get 

— s 2 -2rdr + 2sr(sdr + rds) — r 2 -2sds 


The general situation is now clear. If x t ,...,x k are the coordinates on U k and 
y l ,...,y l are the coordinates on U l , then a differentiable map </>: IR k -> U l is given by 

/0i(v)\ /xA 

^( V )= : 1 v= i 

W)/ W 


Then 


psi p)rh 

= <f>u <P*dyi =d^! = ^- 1 dx 1 + ••• +^dx fc , 


( f)*y l = (}) l , <p*dy t = d^>,. 


rms on IK! to lines 


algebraic operations, 






adding or multiplying two functions, adding two forms, multiplying a function by a~ 
form, are preserved by <j>*. Furthermore the ch ain r ule says that 


4>*d/=d0*/, 


for any function / on U l . 


Suppose that we have 


4>: W and i Jr. W— > Z. 


We can compose the two maps to obtain 

V—>Z. 

If g is any function on Z, we can form 

lf/*g = go [ j / 

which is a function on W and then 

(f)*{ij/*g) = {g°\Jj) o (l) 

which is a function on V. By the associative law for composition, we know that 
Thus 




s o that on functi ons 


(i p°<f>)* — ^* 0 ^ 


Notice the reversal of the order. 


Suppose V = U k , W = R* and Z — R m with coordinates^ 1 ,. 


,x k ; y\...,y l ; 


then if 


to = a,dz 1 H-h3 m dz” 


is a differential form on R m , then 


I j/*(D 


is a differential form on U l and 


is a differential form on If follows from the chain rule that 

4>*ip*df= </>*(d \J/*f) 

— d 

Since </>* and 1 ]/* preserve all algebraic operations, so does (t f/°(f))*, 

_ <p*i]/*{gdf) = (\]/°<j))*{gdf), _ 

and since the most general linear differential form is a sum of terms like gdf, i.e. is of 






for all linear differential forms. 

Let y:(R->F be a curve passing through x i.e., y(0) = x. Then < fr°y is a curve 
passing through y. If y is differentiable at 0, then so is (j)°y and, by the chain rule, 

(</>°y)'(0) = d0 x (y'(O)). 

The right-hand side of this equation depends only on the tangent vector £ associated 
to y. Thus d</> x maps tangent vectors at x to tangent vectors at 0(x): 

where we define 

d«={0(x),d«v)} if « = !x,v}. 

We can thus visualize the differential d0 X as taking infinitesimal curves through x 
into infinitesimal curves through 0(x). 

Now let /: W-> U be a function which is differentiable at </>(x). Then by the 


'')°y=. 


r = 0 gives 


'd4>£J ■ 


This means we can pull / back by and then take the directional derivative with 
respect to L or. we can push forward bv d (b> and then take the directional derivative 


Letting — {x, v}, the above identity is given explicitly by 

D M (^/) = , d </, x [y](/) 

or equivalently, 

d(/^) x [v] = d4 (x) W x [v]] 

which states that 

d(/° 0)* = d/# x) °d0 x 

which is a special case of the chain rule. 

As an example of this identity, let /: JT-> IR and (fr.V-^W be given by 

, , ((r\\ (rcos9\ 







Then 


<*/(*) = (2xy, x 2 ) 




'cos 0 — rsin0 \ 


sin 0 


*cos 0 ) 


so that 


D {4(' e )),«(;)w} (/)=J/ «' dl/ '® c ‘' ] 

„ , /cos 0 - r sin 0 \ fv r \ 

= (2(r cos 6)(r sin 8), (rcos 0) rcos g)[ v J 

„ . / v. cos 0 — ru 0 sin 0 \ 

— (2r 2 cos 9sind,r cos 0)^^ + ro#cosfl j 

= 2r 2 cos 0 sin 0(n r cos 0 - rv e sin 0) + r 2 cos 2 6(v r sin 6/ + ri> 0 cos 0) 
= 3r 2 cos 2 0 sin 9 v r + r 3 ( - 2 cos 0 sin 2 0 + cos 3 Q)v e . 

To verify that this equals D|^ j($*/)> we note that 

<p*f( = f( ^ rcos ^^ = (r cos0 ) 2 rsin 0 = r 3 c os 2 0 sin 0 


0 


r sin 6 


so that 


d((j) */)q = (3r 2 cos 2 6 sin 9, r 3 ( 2 cos 9 (— sin 0)sin 0 + cos 3 0)), 


We then have 






= 3r 2 co s^H sin 0 n r + r 3 ( — 2 cos 0 sin 2 0 + cos 3 0)iy 


Summary 


A Differentials and partial derivatives 

You should be able to state the definition of the differential d/of a function/in terms 

of ‘o’ and ‘O’ notation. 

You should be able to state and apply the rules for differentiating the sum, 
product, or composition of functions. 

You should be able to express the differential of a function in terms of partial 
derivatives and to construct the matrix that represents the differential of a function 

f=n 2 -+n 2 . 


B Coordinate transformations 

Given a transformation that can be used to introduce ne w c oordinates on the plane, 



you should be able to use the chain rule to express differentials and partia 
ivatives in terms 


ions of differentials 



graph ol a function at a given point. 

You should be able to use the chain rule to solve ‘related rate* problems tHat 
involve functions on the plane. 


Exercises 

5.1. Show that if /: V-> W is differentiable at or and if T: W-> Z is linear, then 
T°f is differentiable at or and 

d(Tof) a =Todf a . 

5.2. Let F: V-* R be differentiable at or and let /: IR -> IR be a function whose 
derivative exists at a = F(or). Prove that f°F is differentiable at or and that 

d(foF) a =f'(d)dF a . 


5.4. Lei f:V- 
and that 


dG p = ( dFJl 


dg„= nf n ~ i df ri 

/ 


5.5. Let y: IR -* R 2 denote the curve y(tl = 

, and let F: IR 2 -> IR 2 be the 

\smt/ 


mapping 

// Y \\ /T. Y 2 , A 


F\ ( ) = \ 

. 

\\y)j Wy 3 j 



(a) Compute the tangent vector for y at t = 0 and t — nil 

(b) Find the directional derivative of F with respect to each of these 
tangent vectors. 

5.6. Let /: IR 2 -> IR 2 and g : U 2 -> IR 2 be given by 


Verify the chain rule for the mapping g°f\ 


5.7. Let g:V->W be the mapping g 
straight line 


cos xy 


, and let X : IR -> Fbe the 



(a) Find th e tangent v e ctor at 


to the curve g°X. 


-1 


(b) Compute the directional derivative 


D (— 3 ) (i)(g) in two ways. 


5.8. Let 4>: V-* W be the mapping^: 


rcosd 


r sin 9 , 


t and let/: W R be given 


- ( x \ -, a - 

by /: ~ -»x 3 y 4 . Verify that 


W 

for all tangent vectors £ = (a, v). 

5 9 Define mappings F: IR 2 -*■ IR 2 , G: 1R 2 -*• IR 2 , /: IR 1 

€))■(■':/>»((;)) 

/1 2 + 1 + cos t \ //x\ 

M 3t + 2 > <(,) 


► IR 2 , and g: IR 2 ■ 

■(?> 

= (x 3 y) 


l 1 by 


Verify that 

(a) d(F= G ) (J) = d fG((;r d G(;) 

(b) d (G°FU = dG 




(c) d(g°/) frt = dg fm °dF, 


10 _ 


WWt = d/ 




' g ((;))° dg (;) 


5.10. Let/: IR 2 -> IR be differentiable in some neighborhood of I and satisfy 

\yoJ 


-n 


Xn 




the mapping g, 


,yo, 


,y 0 . 




given by g 


W 


= l//l 


is differentiable and that 


Jv 




d g 


(*0) 

Co' 


df Ol V{\y 0 . 


5.11. A function/ on the plane is defined in terms of affine coordinates x and y 
by 


(a) Is / continuous at the origin P o (x = 0,y = 0)? Justify your answer 
carefully in terms of the definition of continuity. 

(b) Is /differentiable at the origin? Justify your answer carefully in terms of 
the definition of differentiability. 


5.12. So-called parabolic coordinates on the plane are defined in terms of 
Cartesian coordinates x and y by 






/ \77 ( ( dx\ 1 

(a) Express! ) in terms of ) bv means of a 2 x 2 matrix, then invert 

\dvj \dyj 

-— - f dx\ /du\ 

this matrix to express ( in terms of , 

\dy) \dvj 

___,_ /x\ _ 

(b) Invert the coordinate transformation by solving for in terms of u 

\y/ 

/dx\ / du\ 

and v. Differentiate to express in terms of 

\dyj \dvj 

(c) Show that the curves u = constant and v — constant are parabolas 
which are perpendicular where they cross. Sketch these families of 
curves. 

(d) Consider the function / on the plane defined by fip)— 1 /{u(p) + v(p)). 
Express d q /, where q is the point with coordinates u(q) = 4, u(q) = 16, in 
terms of du and dt>, then in terms of dx and dy. 

(e) Suppose that a particle moves along the path defined by the function 
a: R! -»R 2 such that 


°a(t) = 


t 2 +1 


Calculate the derivatives of x°a, y°a, u°a, and v°a at t = 2. 


The equations 


the point u = 2,v = 
this p oi n t/ 


/dx\ fdu\ 

ly, express I I in terms Of I T at 

\dyj \dvj 


(b) Consider the function / (u, r) = uv. Find the equation, m terms ot x 
and y, of the fine tangent to the curve/(M, v) = 4 at the point u = 2, v = 1 
(i.e., at x = 4, y = 3). (Do not try to solve for u and v as functions of x 
an d y; just use the chain rule.) 

(c) Suppose that a particle moves along the path 


At the instant t = 2, when the particle is passing through the point 
^ ^ ^ at what rate are its u and v coordinates changing; i.e., what 
are du/dt and dv/dt at this instant? 

5.14. Let A denote an affine plane, let P 0 be a point in this plane. Invent a 
function/: A -* IR, satisfying /(P 0 ) = 0, which has the property that for any 

affine coordinates s(P),t(P) on the plane, ( —) and( — ) are defined and 

\ds/ t \dt/ s 

equal to zero at P n , yet f is not differentiable at P n . (Hint: replace 





one coordinate’, and an answer would be 

f°at P 0 , 
f(P)=< x 2 y 

— otherwis e 

_ kx 4 + v 2 _ 

wh e r e x(P 0 ) and y(P 0 ) are both z e ro.) 

5.15. In Quadratic Crater National Monument, the altitude above sea level is 
described by the function 

z(x, y) = yf(x 2 + 4 y 2 ), (x, y, z in kilometers). 

The Fahrenheit temperature is described by 

T(x, y) = 100 + 2x — \x 2 y 2 . 

(a) Express d z and dTin terms of dx and dy at the point x = 3, y = 2. 

(b) Find the equation of the tangent plane to the crater at the point x = 3, 

y= 2. 

(c) At the point x = 3, y = 2, along what direction is the temperature 
changing most rapidly? If one follows a path along this direction, what 
is the rate of change of temperature with respect to altitude (accurate 
to the nearest degree per kilometer)? 
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In Chapter 6 we continue the study of the differential calculus. 
We present the vector versions of the mean-value theorem, of 
Taylor’s formula and of the inverse function theorem. We 
discuss critical point behavior and Lagrange multipliers. You 
might want to read the chapter quickly without concentrating 
o n details of the pr oofs . But do th e exercises. 


_ 6.1. The mean-v a lue the orem 

This is on e of the few theore ms that we will not be able to state, in the higher¬ 
dimensional calculus, with the same degree of precision as in the on e -variabl e cas e . 


IWJ I ■ M iT^HM f>l I |Ti5 t\m\ | Wll ilAVi 1 1Klll KIIWB 


differenti ab le on som e i nte rv al fa, 61, then 

f(b)-f(a)=f(z)(b-a) (6.1) 

where z is some interior point of the interval. The point z is in fact difficult to 
determine explicitly, and the mean-value theorem is usually applied as an inequality: 

If/'(x) ^ m for all x e[a,fr], then f(b)—f(a ) ^ m(b — a). (6.2) 

This inequality is of course an immediate consequence of the mean-value theorem as 
stated above, since f'(z ) ^ m. But it is easy to give a direct proof of this inequality 
using the fundamental theorem of the calculus: 


/ (b) —f(a) = I f'(s)ds < md.s ^ m(b — a). 




nient to 




a))dt : 


(a + t[b a) 


— a) j (a + t(b — a))dt < m(b — a). (6.4) 

Jo--- 

Notice that the second equality involved a use of the chain rule.) One advantage of 
6.2) or (6.4) over the original mean-value theorem (6.1) is that it extends immediately 
to the case where /is a mapping from R to U k . Suppose that / is such a map, so/is 
given as a function: 

ffx (t)\ 

/w= : 

\m J 

Then f'(t) = lim t ^ 0 (l//i)(/(£ + h ) — /(£)) = d/[l] (where we think of 1 as a vector in 
U x in the notation of the preceding sections). Clearly the vector /'(£) is given as 

m=l : 

\m/ 


f{b) -m = 



tjb - a))dt -- f'ja 
\ Jo 


4 (6-5) 


By the integ r al of a vecto r -valued function g we simply mean the vector whos e 
components are the integrals of the components: 

/9i\ f /j 0 i(*)dt\ 

If g = ; j then g(t)dt = \ '• I. 

\ 9k J j \ Jgfc(f)dt J 

Of course, we have the direct definition of the integral from approximating sums: 


g(t)dt = lim ( 1 /n) £ g{i/n). 

J "-* 00 i=i 

Since each of the components of this approximating sum of vectors is an 
approximating sum for the integral of the corresponding component function, the 
two definitions of integral for a vector-valued function of one variable coincide. 
Since ||v 1 +v 2 || < || v x || + ||v 2 ||, it follows for the approximating sum and hence, 
passing to the limit, for the integral that 


g{t)dt\\ ^ || 0 (t)lldt. 


Substituting into (6.5) with g = /', we get 


- Han a 


a) it 




[Revalued function. We could apply (6.1) to each component f, of/. For each such 
c om pon ent we wo ul d get fj( b ) — fj( a ) ^-f'j(Zj) ( b — a) , b ut t he Zj would vary from 
on e j to another. Th e r e will, in general, be no point z that can work for all th e 
JjS, and so the analogue of (6.1) need not be true. Nevertheless, (6.5) is true. 

W e now want to generalize 16.6) to the case where / is a map from V-+ W and 
where V is not necessarily one-dimensional. We h a ve a lre a dy observed that the 
gene r alization of/'(%) is d f x . Now d f x is a linear transformation, and we have to 
understand what we mean by || A || when A is a linear transformation from V to 
W. We define 


or, equivalently, 


\A || = max || Xu | 

nun = i 


\ a \\ II Ay > 

I A\\= max —— 

v*0 IMI 


Thus 

||v4v|| ^ || ^41! || v || for all v 

and || ,41| is the smallest number with this property, i.e., if 
- ||/4v|| < fe||v| |— for all —v,- 


then 




If A ! and A 2 are two linear transformations from V to W, then 


II(d t + d 2 )v|| = M t v + A 2 \|| < WA ^W + n 


so 


A i 4 -A, 


+ 

A 2 ] 


(6JI 


For any points a and b in V we shall let [a, b] denote the line segment joining a 
to b, so fa,b l consists of all points of the form a + t(b — a) for 0 ^ t ^ 1. (This is 


a natural generalizatio n of t he 



We wish to prove the following: 

Suppose that /: V-> W is differentiable at all points of [a, b] and its 
differential, d/ x , is a continuous function of x on this segment. Suppose 





en 


_- ---•_ 

1 

3 

1 

\<m\ 

b - a II. (6-8) 


h(t) = a + t(b — a). 


[0,1] ->■ W and, by the chain rule, 


Now dh,[l] = b — a so 


dF t = df m °dh t . 


F'(t) = dF f [l] = d/ fc(0 [b — a]. 


Also 


/(b)-/(a) = F(l)-F(0)= F(t)dt 


= d/ h(r) [b-a]dt 


= A(b — a) 

where A is the linear transformation 

— m— 

= d fh<t)dt. 




In (6.10) we are int eg rating a linear-transformat ion-valued function, the function 
which assigns to each t the linear transformation df h(t) . We can treat such integrals 
just as we dealt with vector-valued integrals, for instance, since V and W have 
standard bases, w e can identi fy every linear tr ansformation with a matrix. The 
integ ral of a matrix-valued function g, where 

g(0 = 


10 given as me matrix wnose ij tn entry is tne integral oi tne numerical vaiuea iuncuon 
Qij. Or, as before, the integral can be given as a limit of approximating sums. It 
follows then from (6.7) that 


|| J g{t)dt | ^ J || g(t) || dt. 

In particular, substituting into (6.10) and using the hypothesis that || d/ x || ^ m for all 
*e[a,b], we conclude that 

IIA || ^ m 

and hence, from (6.9), that (6.8) holds. 


i ^ t car/ 






d 

[dp 

v , , dp 


— 

1 which we denote by pp 

CJ A 

V JX y 

' A 

5 

(df\ , , , . , df 


P which we denote by 2 , 

\ , _ _ CV _ 

°y 

\°y j y 

ana 

d / 

df\ 


dx \ 

,dy) 

which we denote by _ . 

dx dy 


We have already seen that 


S 2 f a (df\ 

dxdy dy \dx /' 

Similarly we can define higher-order partial derivatives when they exist, and have 
the appropriate equality among mixed partials. For example, 

±(±(v\\ = ±(JL(v\\ = ji(±(v\\ 

dx\dy\dx// dy\dx\dx// Sx\dx\d)/// 

etc. The significance of the second (and similarly higher) derivatives is given by 
Taylor’s formula which we will now state and prove. 


simplicity, we first state and prove it at the origin. Suppose that 



fix, y ) =/( 0 , 0) + x/ x (x, y) + yf 2 (x, y) 

or, more succinctly. 


Furthermore, 


/=/(0,0) + x/ 1 +yf 2 . 


/ 1 ( 0.0) = |(0,°) and / 2 (0,0) = g(0,0). 


Now apply the same argument to and / 2 : 


fdx,y)-f l (0,0) + xf ll (x, y) + y/ 12 (x,y) 




where 


r i 


/n(x,y) = 




n 


(rx, ry)df and j\ 2 (x,y) = 


ah 


(tx, ty)dt 


dx 


dy 


and similarly, 


£2 — A(Q> o) + *£21 + xfn- 


TUiici 

TW 


/ =/(0, 0) + xA(0, 0) + y/ 2 (0,0) + x 2 /u + xy(/ 12 +/ 21 ). 


If/ has continuous derivatives up to third order, we can repeat the process once 
again to get 

f=m 0) + xf i(0, 0) + yf 2 ( 0 , 0) + x 2 / n (0, 0) + xy(/ 12 (0, 0) +/ 21 (0,0)) 

+ // 22 (0 5 0) + * 3 £in +* 2 k(/n 2 +/i 2 i +/ 2 u) 

+ xy 2 (£i 22 +/ 2 1 2 +/ 22 l) + .y 3 / 222 

where all the functions /m,/n 2 , etc., are continuous. If we compute the second 
derivatives of both sides of this equation at the origin, we conclude that 

2/i i(0, 0) = 0(o,o) 


/i 2 (0» 0)+/ 2 i(0, 0) = 




dx dy 


( 0 , 0 ) 


and 


3P 


2 f 22 (0,0) = ^ 71 (0,0). 


Thus w e have proved 


dy 


-df 




T >^£, 


/(x, .y)= /(0, 0) + x/- (0,0 ) + . V -r- (0,0) + i x 2 —^(0, 0) 


dx 


W 


dx“ 


&f 




X 


+ xy—^( 0 ,0) + LvV;(0,0) + O 


dx dy 


dy 2 


(6.11) 


It is clear that, if/ has still higher-order continuous derivatives, we can keep on 
going. It is also clear that the same argument works in R* as well as in R 2 . Finally, 

we may replace the origin by any vector u and by u + v: 

Let f:U k -+R have continuous derivatives up to order n + 1. Then there is a poly¬ 
nomial P n in the coordinates of v such that 


/( U + v)=p»+o(iivir +i ). 

The coefficients of P„(v) can be determined by successive differentiations and 
evaluation at v = 0. 

If/: IR 2 -^ 1 , the matrix of second partial derivatives 





is called the Hessian matrix and the corresponding quadratic form is denoted by 


(P + v) =/ (P) + d/„(v) + jd 2 /„(v) + o(IM 


d 2 / p (v) = \ j Hv = (v 1 ,v 2 )H 


( 6 . 12 ) 


The Hessian d 2 / p , as a quadratic form, is subject to the analysis we presented in 
Chapter 4. For example, if f(P) = [x(P)] 2 y(P), then, at the point where x = 2, y = 3, 
we have 


rr- = 2xy = 12 , t ^ = x 2 = 4 , ^ 

dx dy 3x 


d 2 / , , d2 f , , d 2 f „ 

&2 _ 2 j,_ 6, ^-2*-4, a ,-0. 


so that at p = C jd/is represented by (12,4) and H by ^ ^ 


Maxima and minima 

The Hessian is especially useful in analyzi ng the behavior of a function near 

_ /y \ _ 

a critic al point where its differential d f is ze ro. If P 0 = 0 is such a point, then 

\yoJ 

/(Po + v>-/(P-o)-+ id 2 / p ( v) . 

If the quadratic form d 2 / H is positive definite (H has two positiv e ei ge n values), then 
it follo ws from Tay lor’s formula that /(P 0 + v) > for small v and / achi e v e s 
a minimum at P 0 . If d 2 / is negative definite (H has two negative eigenvalues), then 
f( P» + v) < f(Pn) and f achieves a maximum at P». Finally, if H has one positive 


values tor small v, so that J achieves neither a maximum nor a minimum at r 0 ; 
what it has there is a saddle point. If H has one or more zero eigenvalues, and is 
therefore singular, we have to inspect higher derivatives to determine whether / 
has a maximum or a minimum at P 0 . 

As an example of using the Hessian, we find and classify the critical points of 
the function 

/ = 3x 2 + 2 y 3 — 6xy. 

To locate the critical points, we set the partial derivatives with respect to x and 
y equal to zero: 

QJ* 

x- =/j(x, y) — 6x — 6y = 0 so x = y 
ox 

_ f rfV — £,,2 _ £ -A __2 


le critical points are therefore at x = 0, y = 0 and x = 1, y = 





This has a negative determinant, hence its eigenvalues are of opposite sign and 
the critical point at the origin is a saddle point. To confirm this conclusion, we 
note that f{x,y) is positive for points near the origin along the x-axis, while along 
the line x = y the function is negative near the origin. 

At x = 1, y = 1 the Hessian is 


H — 




This has positive determinant, so its eigenvalues are of the same sign; since the 
trace is positive, both eigenvalues are positive. Hence F(x, y) has a relative minimum 
at x = 1, y = 1. 

On an affine plane, the only property of the second differential d 2 / which is 
independent of choice of coordinates is the number of positive, negative, and zero 
eigenvalues of the Hessian. On a Euclidean pla ne, we can inquire about ano ther 
coordinate-independent property of a function /: namely, how it s averag e value 
on a small circle surrounding a point P 0 compares with its value at P 0 . We write, 
from (6.11), 

/(P 0 + v) -f(P 0 ) + d/[v] + id 2 f (v) + error. 



Figure 6.2 


Since dj [ — v] = — d/[v], the average value of d/[v] for any circle centered at 
Po is clearly zero. To find the average value of id 2 /(v), we set \ = (^ cos ®\ 

\h smO J 


so that using (6.12) 



( fl 2 f/r)x 2 fl 2 f/dxdv\fhc os8\ 

id 2 f (v) = i(/i cos 8, h sin 8) | 



\dj /ox dy of joy L J\hsmd y 





or 



d 2 f d 2 f 7 


id 2 /(v) = ih 2 

~ cos 2 8 ± 2 fjr cos ggng+ ; 2 sin 2 B 



ox* ox oy oy j 



Since the average value of cos 2 9 or sin 2 6 on [0, 2 tt] is j, while the average value of 


sin 9 cos 9 is z e ro, w e see that 



Y8 2 f d 2 n 


<2d 2 /(v)>av e rage=i^ 2 

1 

1 M 
>> 
ns 

+ 

fS 

_1 



and that 


</( P 0 + V)> avera g e =/( P 0 ) + 


dx 2 



+ error. 


The quantity d 2 f/dx 2 + d 2 f /dy 2 , which determines whether/ increases or decreases 
‘on the average’ as we move away from P 0 , is called the Laplacian of/. By virtue of its 
definition in terms of an average over a circle, for any coordinates obtained from x 
and y by a rotation, the Laplacian will have the same value. For this reason the 
equation 


d 2 f d 2 f n 

— A -— = A 

9 ' 9 _ XI 


Ox 2 dy 2 



thi s equation arises frequentl y in conjunc t ion with fun ctions on a plane which have a 
physical significance : el e ctric pot e ntial, for example, or temperature . 

Before leaving the subject of maxima and minima, we shall consider the con¬ 
strained extremum problem on the plane: where, along the curve defined by g{P) — 



g = constant 


Figure 6.3 

for such an extremum to occur at P 0 is that d/ Po (v) = 0 for any vector v which lies 
tangent to the curve g( P) = constant. But such a vector satisfies dg(v) = 0. It follows 
that/at the point P 0 where the maximum or minimum is achieved, d/ P must be a 
multiple of dg Po : say df = Ad g. Thus we are led to the LagmngR multiplier method for 




the constrained extremum problem: to maximize or minimize/(P) along the curv^T 


F7IU1 


i i 








■ ai^a| (vkiiyi■ 


along with g = constant, determine the unknown quantities x, y, and x. To 
determine whether the extremum thus found is a maximum, a minimum, or neither 


h=f — Xg, which, by construction, has a critical point at P as a function on the plane. 
We calculate the best quadratic approximation to h near this critical point and 
evaluate it on our vector v for which d/(v) = dg(v) = 0. If this quantity, ^d 2 h{\), is 
positive, we claim that h( P) > h(P 0 ) at all points near P 0 on the curve. Indeed, 
suppose we parameterize the curve g = 0 by p = p(t). That is, we choose a function 
p: U-> [R 2 such that 

0(p(t)) = O, p(0) = p 0 and p'(0) = v. 

(That this is always possible will be proved in the next section - it is a consequence of 
the implicit function theorem to be proved there.) Then 

h°p—f°p since g°p = 0. 

Also _ 

(h°p)'(0) = dhp p (v) = 0 
and 

Thus (h°p)"( 0) > 0 and hence / has a minimum at P 0 along g = 0. Similarly, / has a 
maximum alo ng the curve g = 0 if d 2 /i(v) < 0. 

For e xample, suppose we wish to maximize the quadratic form 

Q(x, v) = 8x 2 — 12xy + 17y 2 


[x,y) = x* 


we find 


ly — lax = i 


— 12x + 34 y — 2X y = 0. 

On eliminating X between these equations, we find 

16 — 12 (y/x) = — 12(x/y) + 34 


(x/y) ~ (y/x) = 3/2. 


Thus 


x 2 — f xy — y 2 = 0 


(x - 2y)(x + %y) = 0. 


i 



intersect the circle x 2 + y 2 = a t 


1/J5 


-32 


nature oi these cn 


dG = 2x dx + 2 y dy. 



x = 2y 


Figure 6.4 


x= -\y 


At F 1? where x = 2y, we Have 

dQ = lOxdx + 5x dy, dG = 2x dx + x dy. 

As we expected, dQ is a multiple of dG, with /, = 5. A vector v for which dQ(v) 


represents the quadratic forr 


3 - 6 X 


We calculate (1,-2)^_^ n)( - 2/ = “ 2 \ - 30/ = 75 and conclude 

that Q has a minimum on the circle at P x . 

At the other critical point, where y— — 2x, we find 

dQ — 40x dx — 80x dy; dG = 2x dx — 4x dy, so X = 20. 

A vector for which dQ(v) = dG(v) = 0 is and, on evaluating the Hessian of 
Q — 20G for this vector, we find that 


so that Q has a local maximum on the circle x 2 + y 2 = I at the point 





63. The inverse function theorem 

Let U and V be vector spaces of the same dimension, and let /: U -»V be a 
diff erentiable map with f(p 0 ) = q Q . We would like to know when there exists an 
inverse map g:V-*U such that g°f = id. Befor e we formulate the appropriate 
theorem, we first examine some necessary limitations on the problem. 

If we expect that g is also to be differentiable, then the chain rule says that 

d 0/<p)° d /p = id - 

Thus the linear map d f p had better be invertible. If it is, then we expect the formula 

dg f{p] = [df p r l 

to hold. 



Figure 6.5 



U = V= U 1 and f{x ) = x 2 . At 


0, d/ n = 0 and th ere is trouble w ith g(y) = yjy n ear y = 0. In fact, th ere are three 
kinds o f troub le. First o f all, ^fy is no t define d (ove r the reals) for y < 0. More 
^precisely, no point y < 0 is in the image of /. Secondly, the square root for y is not 
uniquely specified: for a given y > 0 there are two values of x with x 2 — y. Thirdly, 
the derivative of y/y blows up as y -»0. To get around the second of these difficulties, 
we can proceed as follows. Suppose we choose some x 0 # 0 with Xq = y 0 . For 








the sake of argument, suppose x n >0. Then in a sufficiently small neighborhood 

a hout v 0 (small enough so as nnt tr, in^lud* y = 0), there is a unique inverse function. 

.<- 1 - ~ mnorp root Ktt ~ -_ 1 t tUnf tbu* t/olnpo r*lr\cp» pr»Anoli to 


x 0 . (In this case close enough’ means not to be negative - once we speerty^that 

the square root be positive, it is uniquely determined.) 

We can not only assert the existence of the square root, we ean give an 


algontnm ior conipuiing as close s.n 8,pproxirn3.tioii to inc oC|udrc root ds wc iikc. w c 
recall one of these algorithms - Newton’s method - but formulate it more generally. 

Suppose we are given a map f:U->V with / (p 0 ) = q 0 . We are given some q near 
q 0 and wish to find a p near p 0 such that /(p) = q. Finding p is the same as 
finding p — p 0 - We wish to have 

/(Po + P - Po) = q 

But 

/(Po + P - Po) =/(Po) + d/ Po ( p - po) + o(p - Po) 

= <Io + d/ Po (p - Po) + o(p - Po). 

If we could ignore the term o(p — p 0 ), we would obtain the approximate equation 

q - q 0 = d/ Po (p - po) 

• j r i • 

or, since djr po has an inverse, 

n — n i d (~*(n n t 

F Fo ' 'v pq \m mot- 

This suggests defining 


Pt = Po + d/ Pn 1 (q — q 0 ) 


ds dii approximate soiuiioiij incn 



*tl == XtPi7 I 

and starting anew. Thus > Newton’s method 

g f 

p 2 = Pi + d/p. (q ~qi)>j 

etc. 

Suppose U = V— [R 1 and f(p ) = p 2 . Then d/ p is multiplication by 2 p and hence 
d fp l {w) = w/2 p. Thus, in this case, 

Pi — Po + 0 (<Z <7o)- 

2p 0 

For example, suppose we take 

Po = 3 so q 0 = 9 

and take q = 10. Then 

Pi = 3 + ^(10 — 9) = 3.166_ 

Then 


<7i = Pi = 10.027 77_ 







(Notice that p l is already a much better approximation to ^/IQ.) The next 


approximation is given by 


p 2 - 3.16 + 


1 


= (10-10.027) 


2 x 3.16 


= 3.162281 6.... 


Then 


q 2 = p\= 10.000024 

so p 2 is correct to four decimal places. 

Let us give a second example, with U = V= iR 2 . Suppose that the map / is given 
by 


/ 




' % 3' 

— y 
2 xy 


Then d f, X s is the linear transformation whose matrix is 
(yl 


Suppose 


'3x 2 —3 y 2 ' 

2y 2x / 

Po = L — 


so th at 


'8 — 1 


<lo 


2 - 2-1 


AT 


and 




12 — 3 


Po 


and 

Ulivl 


T 


4 3 




54' 


2 12 , 


Suppose we take 


Then 


/7.5\ 

q ~\ 3 - 8 /’ 

Pi =Po+(d/ p r I (q-q 0 ) 


2 \ 1 , 

1 J + 54 


4 3 ' 

-2 12 


0.5' 

- 0.2 


/ 2.026 ^ 
— \ 0.937 ) 

we get 


/7 493 ^ 




\ 3.796 y 







which is already quite close. Notice that at each successive stage in this algorithm 

w e have to compute a different value of d f~ x . _ 

A mathematical theorem will be formulat e d which asserts that, under suitable 
hypotheses about /, Newton’s method will give a sequence of points p ; whiclf 
converge to a solution p of /|p) = q, provided that q is sufficiently close to q 0 . 

Another al gorithm which converges much more slowly than Newton’s method is 
to set 

and 

Pi = Po + -L(q - q 0 X 

qi =/(Pi), 

p 2 = Pi + L(q-qi), 

q 2 =/( P 2 ), etc. 

This is known as Picard’s method. For example, with /(p) = p 2 and 

p 0 = 3, we get 

pi = 3.16 



An advantage of Picard’s method is that we only need to compute L once. It is 
easier to formulate and p r ove the slow convergence of Picard’s or Newton’s method 


with fewer assumptions about / than it is to prove the fast convergence of Newton’s 
method, which requires more hypotheses about /, as we shall see. 


Proofs of convergence 

We now formulate the hypotheses we need about / and prove the convergence of 
both methods. Recall that, if / is differentiable at p, then 

/(P 1 ) =/(P) + d/pCp 1 - p) + o(p J - p) 
which means that given any e we can find a <5 such that 


ll/(P 1 )-/(P)-d/ p (p 1 — p) || ^ e || p 1 — p || (6.13) 

whenever Up 1 — p|| ^ <5. The 5 that is required for this inequality may depend on 
the point p. Let us assume that / is uniformly differentiable in the sense that for 
any 8 we can find a d such that (6.13) holds for all points p and p 1 in some ball 
centered at p 0 . So we assume that there is some a > 0 such that given any e we 
can find a 5 such that (6.13) holds if 


p : -p|l^(3, Up —Poll Up 1 - poll ^a. 


Let us also assume that d/ p 1 is 


is some constant 


such that 


assume that d L itself is unifor 


< M for all [| p — p 0 1| ^ a. 


lor any e > 0 there is a o such that 

IId/ p -d/ p i II ^y if IIP — p 1 II ^<5, Ilp-p 0 || IIP 1 -PolKa (6.15) 
and we only need assume that (d/ Po ) _1 exists. 

Convergence of Newton’s method 

Now let us look at Newton’s method. The step going from p ; to p i+1 is given by 

P ;+1 = Pi + (df h )~ Hq -/(Pi)). 


But 

/(Pi) =/(Pi-1 + Pi - Pi-1) =/(Pi-i) + d/ Pl _ t (Pi - Pi-1) + o(Pi - P/_ i) 

and 

q -/(Pi - 1) + d/ Pi _ t (p f - p, - 1) = o - 

Also, if || Pi -Pi-ill <3 then || o(p £ - Pi-i)|| <e|| p f - Pi-j IT by (6.13). So 

ItPi + i - Pill <*ellPi-P.--ill- (616/ 

We may choose £ small enough so that Kc < \ and also Ke<\a, provided 3 is 
sufficiently smaltrNow 

Pi = Po + d/~ x ( q ~ q 0 ), q 0 = /(p 0 ) 


so if 


q 0 II < S/K 


IIPi - Poll 

If 23 < a, so in particular <5<|a, the point Pi will satisfy 

IIPi - Poll < a 

so that, in particular, p 1 is in the domain of definition of /, and we can use the 
algorithm to define p 2 . It follows from (6.16) that 

IIP 2 — Pi II 
so 

IIP 2 - Po II < II Pa - Pi II + || Pi - Po II < (i + 1 )ia < a. 

Thus p 2 is again in the ball of radius a so we can apply the algorithm and (6.16) 
to get 

_ IIP3-P2II ^ 2IIP2 — Pi II _ 

and hence 


’3—Po 


a , i_L.nl, 





etc. We can always continue to the next step since (by induction) 


IIPi-pi—ill <<V 2 ' 

and 


„ (\ 1_/) 

i_ i_ _ 

llPi Poll < + +•■•+! 

^ 2-^a ^ a. 


The sequence of points p ( converges to some point p since 

II Pi +k Pi II ^2 T - * + 1)<5 ^2i“l 

Finally, 

- 3 —> 0 as i —> oo. 


II q -/(Pi) II = II d/ p (Pi +1 - Pz) II < M II Pi + ! - p f II -*■ 0 
so, by the continuity of /, we see that 

/(p) = q 


Uniqueness of solution 

We now look at uniqueness. Notice that K is determined by /. We are free to 
choose a smaller value of a, if we wish, without changing K. This is at the expense 
of choosing S and hence 3/K smaller. In particular, we may assume that a has 
been chosen so small to start with that (6.13) holds for any pair of points p and 
p 1 where p.K < 1 . Now for any pair of points 


IP — P 


(d/ n )~ 1 (d/ n (p — p 1 )) || ^ K || d/ n (p — p 1 ) 


if m =/( p 1 ), 


pvr - P) II < e IIP - PII and combining 


these two inequalities we get 


, P — P 1 II ^ e/C || p — p 1 )t, bK < 1 


which can only happ e n if | | p - p 1 1| = 0, i. e ., p = p 1 . 


at most one 


solution of /(p) = q with || p — p 0 1| < a. 



Now let us look at Picard’s method. Let L= (d/ ) 1 . Then 
p i + i =pi + L(q -/(Pi)) 

= Pi + L(q -/(Pi - x ) + d/ Pj i (Pi - Pi - x) + o(Pi - Pi - 1 )) 

as before. Now 

q ~/(Pt+1) + d/ Po (Pi - Pi-1) = 0 

so 

llq-/(Pi-i) + d/ Pi _ i (Pi-Pi_ 1 )|| = ll(d/ Pil — d/ Po )(Pi — Pi-i) || 

Pi-Pi-ill 


provided we take a small enough. Also, we can choose 3 small enough so that e 
is replaced by js in (6.13). Then 

II p i + 1 - Pi II < II Pi - Pi -1 1! + je II Pi - Pi -1 II )'< ke |! p; --pi_ i II 




so that (6.16) holds as before, where k = || L ||. Actually, we can use the mean-value 
heorem to rephrase the argument for the PiVarH method so as to avoid th 


UIHlil/mUM.HHI 


where L = (d, 


Then 


Hp) = p + L(q -/(p)) 

By the continuity of d f we can choose a small enough so that_ 


IIP/+1 - PiII = II %i) - Kv OKI|| p t - - Pi-1II 
by the mean-value theorem and we can proceed as before. 

We can also understand why Newton’s method converges so much more rapidly, 
when it works. Suppose that / has two continuous derivatives. Then, by Taylor’s 
formula, 

I /(P 1 ) -/(P) - d/pfa 1 - P) I < c || p 1 - p || 2 

(where c is a constant given by the maximum of |d 2 /|), a much stronger inequality 
than (6.13). Going back to the proof of (6.16) and substituting this inequality, we 
get 


IPi + i-Pi 


Pi - P« 


If, for e xample, we started out with—H Pi ~ Po ll— small 
fc II Pi - Po II 1/2 5 1 ( and II Pi> PoT < I), the above inequality w 

H”Pi+ 1 “Pill < II Pi — Pi - 1 II 3/2 


II Pi i i - Pi ll < II Pi-P oll (3/2)n » 

an exponential^ rat e of decrease instead of the geometrical o ne || p f+1 — p t -|| 

L e t us summariz e what w e know so far. W e hav e shown that under suitable 
hypotheses there exists a ball B around q 0 =/(p 0 ) and a ball C around p 0 such 
that for each q eB there is a unique peC with /(p) = q. In other words, we have 
defined a map 

g.B^C 

such that 

f°9 = id 

and 

Q°f= id. 


Differentiability of solution 

We now want to prove that g is differentiable. Notice that the uniqueness of g 
implies that a is actually continuous. Indeed, suppose that #(q) = p. Draw a small 




Figure 6.7 


small ball around p. But, by uniqueness, this inverse must coincide with g. Hence 
g maps a small ball around q into a small ball around p, i.e., is continuous. Now 

v =f(g( q + v)) ~f{g{ q)) = d/ p (g(q + v) - g(q)) + o{g(q + v) - g(q)). 

Applying (d/ p ) -1 to both sides we get 

(d/p)~ *( v ) = 9( q + v) - g(q) + o(g( q + v) - g( q)). 

Since g is continuous we can choose v small enough so that || o(g( q 4- v) — g( q)) || is 
smaller than j || g(q + v) - g(q) ||. The preceding equation implies that 


II ( d /p) *( v ) II + II o(g(q + v) - gr(q)) || ^ || g(q + v) - g( q) 


so 

g(q + v) — g(q) II 2II (d f„)~ 1 (v) II_ 


i.e., g(q + v) - g{q) = G(v). But then 


o(g(q + v) - f/(q)) - o(0(v)) = o(v) 

Sn frnm tVlP \xrp 


g( q + v) — g(p) = (d/„) ^v) + o(v). 


Le., g is differentiable at q with derivative 


d^/( P ) (d/p) • 

We have thus proved the 


Inverse function theorem. Let /: L/ -> 7 be continuously differentiable with 
f (Po) — q 0 an d d/ invertible. Then there exist balls B and C around q 0 and p 0 
such that there is a unique map g:B^>C such that f°g= id. This map is 
continuously differentiable and 

d#/(p) = (d/p) / 


The implicit function theorem 

Let us draw some consequences of the inverse function theorem. Suppose 
G: IR 2 -> IR 1 with G(x 0 , y 0 ) = 0 and 






can find an inverse map g with f°g = id. We may write 



so that the equation f°g = id becomes 


F(u, u) = u, 
G(F(u, u), H(u, v)) = v. 

Substituting the first equation into the second gives 

G(u, H(u, v)) = v 

and s e tti ng v = 0, h(u) = H(u, 0) gives 

G(u, h(u)) = 0. 


Thefunction h(u) is differentiable and is the unique solution to this equation. The 

and differentiability of h is the content of the implicit 


function theore m. Thus the imp licit function theorem in one vari able i s a consequence 




Let G be a differentiable function with 


G(x 0 , y 0 ) = 0 and (dG/dy){x 0 , y 0 ) ^ 0. Then there exists a unique function h(x ) defined 
near x — x 0 such that h{x 0 ) = y 0 and G{x, h(x)) = 0. The function h is differentiable 
and h\x ) = — {dG/dx)j(dG/dy). 

We can reformulate the preceding argument. The simplest map (other than the 
constant map) that we can imagine from [R 2 -^ 1 is projection onto one of the 
factors 


n: 



= v. 


Now let GiiR 2 -^ 1 be any continuously differentiable map and suppose that 
dG po is surjective (which, in our case, where the range is one-dimensional, means 
that dG Pn 7 ^ 0). We claim that there are local changes of coordinates , i.e., maps 


g: 1R 2 ->[R" 




locally defined and having a differentiable inverse so that 

G°g = It. 

Indeed, since dG po ^ Q, we can make a prelim ina ry linear ch a nge of coordinates 
in the plane so that dG/dy ^ 0. Then the above argument applies to give a map 
g such that G°g = n. Thus, if we allow arbitrary changes of coordinates, the most 
gen eral continuously differentiable map with dG p surjective ‘looks like’ projection 
onto a factor. For example 


G(x,y) = (x 2 +y 2 )$ G°g(d,r) = r 



Figure 6.8 


The simplest non-trivial map from R 1 -► R 2 is the map i which simply injects R 1 as 
the ‘x-axis’ 


y• P* _ j(x) — 1 




w 



We claim that if G: R 1 -> R 2 is any continuously differentiable map with dG Po ^ 0, we 
can fin d a c h a nge of coordinates , i.e ., a continu ously differentiable map / with 
differentiable inverse such that 


f°G = i 



Indeed, by a preliminary linear change of variables in the plane, we can arrange that 



By a translation we may assume that G(p 0 ) = 

F: R 2 ->■ R 2 = 


Now define 

Gi(x) \ 

G 2 (x) + y) 




where 



WO = | 

■ c f • 

_ml_ 

W2 VJJ 

1 nen 

_ A F_j 

ri Q\ 


- 


and hence F has a continuously differentiable inverse. Now 



by the definitions of F and i. Hence, taking /= F 1 

i=f°G. 

Locally, we can ‘straighten out any curve’ by a change of variables. 


6.4. Behavior near a critical point 

Suppose that / has a critical point at p 0 . Let us assume that we have made a 
preliminary choice of coordinates so that p 0 = 0 in U 2 . Suppose d 2 f 0 is non- 
degenerate i . e. that Det(d 2 / 0 ) ^ Q. In other words, we assume that the symmetric 


f 


-a 2 / \ 


H = 


OXi ox^dx 2 


d 2 f d 2 f 


\dx 1 dx 2 - 8x2 —/ 


of sectio n 4. 2 , we k n ow that we c an m ak e a 
linea r cha n ge of coordinates L s o that LH(0)L J has one of th e three forms 



Li/(0)L 


Let us assume that we have made this preliminary linear change of coordinates, 
so that d 2 / 0 already has one of these three standard forms. Now by our proof of the 
Taylor expansion, we know that 

f(x, y) =/( 0 ) + b lt (x, y)x 2 + 2 b 12 (x, y)xy + b 22 (x,y)y 2 
= /( 0 ) + (x,y)( bl1 bl2 




m + (x,y)B 



where the b u are continuous functions of x and y and the matrix valued function 
B when evaluated a t. the origin is just d 2 / 0 , i.e., _ 

B( 0) = H( U) 


d 2 f 

b 22 m=^«». 

Now B is a symmetric matrix. Let us apply the Gram-Schmidt procedure to 
J5(x)^(3^ an ^ the cont ^ nu ^y B we know that the scalar products 


(l,0)B(x)l Q 1, 


(U0)B(x){ 


WTO 


depend continuously on x. Hence for x close enough to zero we can find an invert ible 
matrix 2(x) (given by the Gram-Schmidt procedure) such that 


The Gram-Schmidt algorithm guarantees that Q is a differentiable function of x. 


/(x) =/(0) + y T tf (0)y, 


where 


y = 6(x)x. 

Now the map xi—»y given by this formula is invertible by the inverse function 
theorem! In more detail: let 4>: [R 2 1—► [R 2 be defined by 

0 (x) = Q(x)x. 

Then, by the product formula, 

d</> 0 (x) = (dQ o (x))0 + Q(0) x 


d(j) 0 = id. 


us 



but then QA*/)(y) = /(x) scT 


<A*/(y) =/(0) + y T if(0)y. 

In other words, i^*( /- /(0,0)) is q uadratic! We have proved that, near a n y noru 
degenerate critical point, it is possible to introduce coordinat e s and y 2 such that 


f (y !■> y 2 ^ —/(0) + Q{y l ,y 2 ) 


where 


Q(yuyi) =±{yi + yl\ or -yj + yj. 


Which of the three alternatives holds is determined by the normal form (the number 
of negative eigenvalues) of d 2 / 0 . 

This proof is completely general - it works in n dimensions: So, if 0 is a non¬ 
degenerate critical point of /, it is possible to find coordinates in terms of which 


where 


/(y) =/(0) + <2(y) 
Q(y)= ±yl±yl + 


The number of — signs (called the index of Q) is the same as the number of negative 
eigenvalues of the matrix d 2 / 0 . This result is known as Morse’s lemma. We will 
make use of this lemma in our study of asymptotic integrals in Chapter 21. 


_Summary_ 

A_Higher_derivatives_ 

You should be able to write down the Taylor expansion of a function on the plane 
through terms involving secondTpartial derivatives. 

_ You should be able to a ppl y the cha i n rule in order t o ex press second partial 

coordinates^ 

B Critical points 

You should be able to locate the critical points of a function on the plane and to 
classify each critical point as a maximum, minimum, or saddle point. 

You should know how to use the method of Lagrange multipliers to find the 
critical values of a function of several variables subject to constraints. 

^ Inverse functions 

You should be able to state and apply the inverse function theorem. 

You should know how to use Newton’s method to find an approximate solution 
to /(P) = q where / is a function from U 2 to R 2 . 


Exercises 



f 0 if x - y - 0 

6.1. Let r(x, y) =< 


1 

, - otherwise. 

U + y 2 -- 




(a) Calculate dF/dx and dF/dy. Are they continuous at (0,0)? 

(b) Calculate d 2 F/dxdy and 8 2 F/dydx. Are they continuous at (0,0)? 
(Note: If they are not, you may not compute their values at the origin 
by finding a general formula and trying to let x and y both approach 
zero!) 

(c ) Show t hat {d 2 F/dxdy)(0,0)*{ d 2 F /dydx)(Q,Q). 

(d) Invent a smooth curve through the origin described by x = X(t), 
y = Y(t ) with A^O) = 7(0) = 0, such that the function G(£) = F(X(t), 
T(r)) is not differentiable at the origin. 

6.2. Find and classify all the critical points of the function F: 1R 2 -*■ IR given by 

F(x, y) = x 3 + y 3 — 3xy. 

6.3. Let F(x, y) = x 2 y — 3xy + ^x 2 + y 2 . 

(a) Find the equation of the tangent plane to the graph of z = F{x, y) at the 
point x = 2, y = 2, z = 2. 

(b) The function F(x,y) has three critical points, two of which lie on the 
line x = y. Locate these critical points and classify each as 
maximum, minimum or saddle point. 

6.4. Consider the function F on !R 2 given by 

F(x, y) = x 2 — 4xy + y 2 — 6x _ l . 


(a) Find the equation of the plane tangent to the graph z = F(x, y) at the 


point corresponding to x = — 1, y = — 2. 


(b) Locate the critical point of this function and determine its nature. 


6.5. Find and classify all critical points of the function F(x.y) = y 2 + 

r=-y(e~ 4jc ^-l) + 9x 2 q-6y 2 hasacriticalpoint 



at the origin, and determine the nature of this critical point. Describe the 
level curves of F(x, y) in the neighborhood of the origin. Sketch a couple of 
typical curves. Describe the level curves of F(x,y) in the neighborhood of 


the poin t x = 0, y = 1, an d sketch typical curves. 


6.1. F ind the critical poi nts of the fo llowin g func tions 


F(x, y) — 5x 3 — 3x 2 y -f 6xy 2 — 4y 3 — 27x + 27y 

and determine their nature. (At a suitable point in the calculation add two 
equations. The resulting homogeneous polynomial factors. The critical 
points have integer coordinates.) 

6.8.(a) Find the critical points of the function F(x,y) = xy 2 Q~ (x+y) . 

(b) Determine the nature of the critical point which is not at the origin. 
Sketch, as accurately as you can, some level curves near the point. 

(c) For the critical point at the origin, the Hessian vanishes and is no help. 
Figure out whether the critical point is a maximum, minimum, or saddle 
point. Sketch some level curves near the origin. 

6.9. Let x and y be the usual affine coordinate functions on a plane. Another 
pair of coordinate function on the right half-plane (x > 0) is defined by the 
equations. 

u = x 2 — y 2 , v = 2xy 


-(a)~ Express d u and dv in terms of dx &nd dy-and write the-matrix which 

/j \ /j \ 

exnresses 1 

l u u 

i / ax \ ; : . . 

1 in terms of 1 1 nt th/=> nnitit P with ennrdinates 


VdrJ 

\d yj 

_x = 2, y =: 

l,M = 

■ 3,0 = 4. 





(b) hind the approximate x and y coordinates of a point Q such that 
u(Q) = 3.5,v(Q) = 4. 

(c) Let 4> denote the electric potential function on the plane. Given that at 
_ the point P(x = 2, y = 1, u = 3, v = 4){d(j>/du\ = 2 and 84>/dv = - 1, 

calculate dd>/dx and d<j>/dy at this point. Describe the direction along 
which 4> increases most rapidly. 

(d) At the same point, express d 2 <f>/dydx in terms of partial derivatives of 
with respect to u and v. Your answer may also involve explicit 
functions of x and y, of course. 

6.10. Suppose that coordinates u and v on the plane are expressed in terms of x 
and y by 

fu\ (cos a — sina\/x 
\v) \sina cos ct/\y 
Let / be a twice-differentiable function on the plane. Show that 

d\f d*£ = d*f s*i 

du 2 dv 1 dx 2 dy 2 

6.11. Polar coordinates r, 0 on the plane are related to Cartesian coordinates by 
the equations 

x\ /rcos0\ 

77 Vrsin0/‘ 

Suppose /: R 2 -»IR is a fun ct ion sa tisfyi ng Laplace’s equation, 

e 2 f d 2 f — 

- —1 - - = 0 . 

_ dx 2 dy 2 _ 

Express this equation entirely in terms of derivatives of / with respect to r 
and 0. 


6,1.2. Let /4(K! 2 -*[R be a twice-differentiable function. If 


1 1 

( r cos 0\ 


\y) 


\rsin9 J 


express d 2 f / 89 2 in terms of partial derivatives off with respect to x and v. 


6.13. Let /: IR 2 ■-> M be a twice - differentiable function. If x # 0 and 

/ r \ = /^ 2 + /)\ 

\6J Varctan(y/x)/’ 

express d 2 f /dx 2 in terms of partial derivatives of / with respect to r and 0. 

6.14. Given that df/dx=f+df/dy , show that d 2 f/8x 2 — d 2 f/8y 2 = 

/+nd/m 

6.15. With polar coordinates as in exercise 6.11: 

(a) Let / be a function on the plane. Suppose that at the point whose 
coordinates are x = 3, y = 4, 8f /8x = 2 and 8f /dy = 1. Calculate 
8f /dr and 8f /89 at this point. 

(b) Suppose that / satisfies the partial differential equation 

8 2 f 1 8f 1 8 2 f 

—- + - — + -v — = 0. 

_ dr 2 r dr r 2 89 2 _ 

Expr e ss this e quation e ntirely in terms of partial derivatives of / with 
-r e sp eet to x and y. - 



6.16. Suppose that / is a function on the plane which satisfies Laplace’s 


equation d 2 f/dx 2 + d 2 f/dy 2 = 0. Express this equation in terms of the 

_ 1_ _ 1 * r. _ . _ 7^, . " • 1 /■ i A T 'i n , o 


_ J ‘ - i ^ j j ^ w — I- 1 _ 

parabolic coordinates of exercise 5.13. It may involve d 2 f/du 2 , d 2 f/dv 2 , 


d 2 f/dudv, df/du, df/dv , u, and v, but not x, y, or any partial derivatives 

• j i 


with respect to x or v. 


6.17. Let 


be the mapping 


<t>\ 


x 


e + e y 


Show that (f) can be inverted in the neighborhood of any point and 
compute the Jacobian of the inverse map. 

6.18. Consider the surface in !R 3 defined by z = F(x,y), where F(x,y) = 
x 2 — 2 xy + 2 y 2 + 3x + 4 y. 

(a) Find the best affine approximation to F near the point 

(b) Write the equation of the plane tangent to the surface at Y 

(c) Find the equation of the line normal to the surface at the same point. 

(d) The equation F(x,y) = 8 defines a function y = g(x). Evaluate g’( 1). 

6.19. An important problem of statistical mechanics is the following: Consider 


a physical syst em which can have energy + £, 0 or — E. Let x denote 

the energy i s — E . The n 1 — x — y is the probability that the energy is 
zero.-Maximize the-efitropy S r defmed as - 


S(x,y)= — xlogx — ylogy — (1 -x-y)log(l - x-y) 


subject to the constraint that the average energy is E 0 ; i.e., 


F{x,y) = xE-yE = E 0 . 


Solve this problem using a Lagrange multiplier /?, and show that x 


oc e 


-PE 


y = e +/l£ , where £ 0 = — 2 E s inh flE. (The Lagrange multiplier in this case 
turns out to equal 1 /T, where T is absolute temperature.) 


6.20. Consider the function a: !R 2 -> [R 2 defined by the formula 

' X \\ / v„3/2 

a(P) = Fl ' 



xy" 


x 3/2 — y 2 / 


(a) Calculate the 2x2 Jacobian matrix which represents the linear part of 


the best affine approximation to a near the point 


matrix to determine the approximate value of F 


(b) Use the matrix to obtain an approximate solution of F 
/ 4.2\ 



. Use this 



, 6 . 6 , 


6.21. Let /(x,y) — ^(x 2 y 4 + 9x 2 ) . 


(a) Find the best affine approximation to this function near the point x — 1 , 





(b) At the point x = \,y = 2, along what direction is the rate of change of 

the function f(x,y) greatest? 

(c) A solution of the equations 

/Y— -A c 

J (^5 y) — 5, 

x + y = 3 

is x = 1, y = 2. Construct an approximate solution to the equations 


f(x,y) = 5.34, 
x - 1 - y = 3.05 

by using the approximation from part (a). 

6.22 Functions s and t are defined in terms of the affine coordinate functions x 
and y on the region x > 0 , y > 0 of the plane by 

s=xy, t = \ogy — logx. 

(a) Express the differentials ds and df, at the point whose coordinates are 
x = 1 , y = 2 , in terms of dx and dy. 

(b) At the point x = l, y — 2, the values of s and t are s = 2 , t — 
log 2^0.693. Use the Jacobian matrix at this point to find the 
approximate x and y coordinates of a point where s = 2.02, t = 0.723. 

(c) Let / be a twice differentiable function on the plane. Express df /dx, 
df /dy and d 2 f /{dydx) in terms of x, y, and partial derivatives of / with 

respect to s’ and t. 
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Chapters 7 and 8 are meant as a first introduction to the 
integral calculus. Chapter 7 is devoted to the study of linear 
differential forms and their line integrals. Particular attention 
is paid to the behavior under change of variables. Other one¬ 
dimensional integrals such as arc length are also discussed. 


Introduction 


In this chapter w e s h all dis cl ose t h e true geometr ic m ea nin g of l inea r d iff er ential_ 

forms; they are objects which are to be integrated over oriented paths to yield 
numbers. We begin with some examples. Consider the orie-form 

a) — j(x dy—ydx). 

By its definition i t is the rule w hich assigns to every point the row vector 


y.i — y, x). Now a row vector is a linear functi on on vect ors. The row vector 



fr\ 


j(— y, x) is the linear function that assigns to the vector h = 


the number 


W 


^[h] = j(xs-yr) 



whirh ic inct oripritpH arpa of fho trianol<=> from tfi<» orirrin to 1 

( 

- ( 

to f 

1 + j 







\ s ) 






v 11 « v\ rirt X T T /% w rt /x r\ /-»i i >/x *./ 1 

(*(t)\ 1- - f • , 

suppose we naa d curve oc{t) = 

. We can wroose a number ot points 

n r»iifira onrl 1 at Vi - n « 

nri__ _ 

Fi uu aiiu — jt»i + i — Fi- i iien me sum 





Figure 7.2 


is the total oriented area of the various triangles. In the limit, as the polygon 
joining the p ; approximates the curve, we expect this sum will tend to a limit - the 
(oriented) area swept out by the radius vector moving along the curve. We have 
alre ad y enc ountered this notion in our study of Kepler’s second law. 


A second example to keep in mind is the notion of a force field. In three 
dimensions, a force field 

co = Fdx + Gdy + H dz 


gives a l inear function 


(F, G, H ) 


point of space. This linear function measures the resistance or impetus to 


any infinitesimal motion -It assigns 


(. F , G, H) = Fv x + Gv y + Hv z 

to any displacement vector v at the point p. Along any path, T, we expect to be 
able to integrate and get 

co = the work done by moving along T. 

Notice that we wish to be able to assign work to all paths. We can imagine a 
two-dimensional universe in which a force field would be 

co = F dx + G dy. 

For example, see figure 7.3, we can imagine feeling the influence of gravity while 


being constrained to move on a surface 


^ =/(■*, y)- 




The force field would then be proportional to 

a} = df=^dx+^dy. 

Suppose we had a perfectly reversible electric car. (By perfectly reversible we mean 
that all the energy of braking is returned to the battery - no air or other kind of 
resistance.) We could keep track, using a meter, of the total energy flow into and 
out of the battery. Let us call this B F — £, (the difference between the final and 
initial readings of the battery). We can also consider the kinetic energy at the 
beginning and end of the trip, KE, and KE F . The principle of conservation of 
energy says that 


KEp — KE, -f B f — B, — 


.rr 


co = the work done along the path. 


(Thr ou ghout this discussion we ar e assuming that the forces a r e not ve locit y 


d e p e ndent: that there is a definite force field wh e re the forc e depends only on t he 
location irrspace} 


Noti ce a su btle difference in vi ew p oint fro m the use of for ce in Newton ’ s l a ws 
In Newton’s laws, we are interested in predicting how a particle will move - if we 
set a pebble rolling on our surface, how will it continue to move? Newton says 


that the motio n is determined b y the equations 


dp 


d t 


F p = 


v = velocity vector. 


In our present discussion we are interested in how much energy is used in driving 
along a given curve. The force field co assigns energies to paths T. If co = df, then 
we expect that the total work done along the path is just the potential difference 

m)-m 



Figure 7.4 





where P and Q are the initial and final points of the path. 

With this motivation in mind, we now turn to the mathematical discussion. 


1 



integrals 


By an oriented path in the plane (or in IR fe ) we shall mean a curve which is to he 
traversed in a specified sense. A path like T, whose endpoints do not coincide, has 
a well-defined ‘beginning’ ( P a in figure 7.5) and ‘end’ {P b ); interchanging ‘beginning’ 
and ‘end’ reverses the orientation. A closed path like T 2 has no well-defined 
endpoints; any point P can function as both ‘beginning’ and ‘end’. For this sort 
of closed path, it is still possible to assign an orientation, which then determines 
a ‘beginning’ and ‘end’ for any piece of the path. 



Figure 7.5 Figure 7.6 


Physically, su ch a pa th is appropriate to rep resent the trajectory o f a particle 



what order, but not the speed with which the particle traveled. It is permissible 
for a segment of a path to be traversed two or more times; for example, a particle^ 
might move tw ice counterclo c kwis e ar ou nd t he unit cir c le , o r i t might move from 
P a to Q, then back to R, th e n forward aga i n to P b . Such paths may b e difficult to 
represent unambiguously by drawing curves with arrows attached, but they make 
good physical sense, and as we shall see, they are easy to describe in terms of 
functions. 




We shall restrict our attention exclusively to piecewise differentiable paths- 
continuous paths for wh i c h a w ell-defin ed tangent exists at all except possibly a 
finite number of points. Such a path can be described as the image of an interval 
of the real line under a continuous map 

a:P -»^ 2 



which is differentiable except at finitely many points where a may not be 
differentiable. The function a is called a parameterization of the path. Physically we 
may think of a as the function which assigns to each instant of time the position of 
jthe parti cle at that instant. We usuall y describ e a by specifying the pullback of the 



Figure 7.9 


coordinate functions; that is, by writing formulas which give the numerical values of 
the x- and y-coordinates as functions of time. For example, if a particle moves along 


a circular arc of radius R from 




we may describe its path by 









i head to tail . We simply introduce subdivision points, r 0 , r 1? F 2 , ..., r N , where 
is the beginning of the path, its end, being sure to include as subdivision 


iinirki ui I'l 


IKliIt/Jiall 




displacement vectors 


v o — PoPi> y 1 ~ PiP n -1 ~ Ps-iP*. 


Pn~ ?b 


V 



By choosing the subdivision points close enough together, we can in this manner 

ebva polygon 




piecewise smoo 




cannot be well-approximated in this manner, but such paths cannot be paramete 
ized by differentiable functions, and we shall not concern ourselves with them. 


i i W Si wji\n jfi kwCi iWij i i [n<8iiw5il*j fill 


vector v ( - attached. The differential form ft> assigns to such a segment the real 
number aX-PJCvJ. We form the sum over all segments: 

Is=Y ®(P,)[v,] 

1 = 0 

which is very much like a Riemann sum for an ordinary integral. We now take 
the limit as the number of subdivision points increases in such a way that all 
vectors v* approach zero. If this limit exists, independent of the precise manner 
in which the subdivision points are chosen, it defines the line integral of the one-form 
co over the path T, which we denote by J r <n. That is, 

' jV-l 

( 0 = lim £ £)(?;) [vj. 

-J r - N-* oo i = o - 

We shall soon prove that for piecewise differentiable paths, the limit exists, and 
is independent of the subdivision, and shall give a formula for j r ft> using pullback. 

Three properties of the line integral f r a> are apparent from the definition: 
nrrir is linear in o: that is, iflo = cd 1 + o 2 , then J r ft> = fr&h + and if 

from the definition of the sum of differential forms and the product of a differential 
form and a real number. 

(2) If T consists of F t followed by T 7 , then 


This implies that we can always subdivide a piecewise differentiable path into 
differentiable portions, as suggested by figure 7.14, and calculate the line integral 
over each portion. 


Figure 7.14 
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This is true because reversing the orientation of F just changes the sign of eacfT 
of the vectors v f , and , since co is linear, 

co(P)[-v] = -ft)(P)[v]. 

We turn now to the p r oblem of computing the numerical value of a line integral. 
The strategy is to reduce the problem to calculation of an ordinary integral over 
the parameter for the path of integration. The parameterization 

a:lR^R 2 

maps an interval [a, b~] of the real line into the path T. We assume that a, 
the lower bound of the interval [a, b], is mapped onto the beginning point P a of 
the path, while b is mapped into the end point P b . By looking separately at the 
smooth pieces of our path, we may assume that a can be described by a pair of 
differentiable functions, 

ot*x = X(t), a*y = T(f), 



b 


t 
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so that, by the chain rule, 


<x*dx = X'(t)dt, a*dy = Y'(t)dt. 

We may assume that our subdivision of the path corresponds to a subdivision 


a = t 0 < t 1 < ■ ■ ■ < t n = b 


so 





Thus our approximating expression to the line integral can be written as 

E<a(fi)[«fe + 1 ) — «(t f )] 

_ i _ 

^{g(P i )dxla(t i+1 )-oc(t i n + h(P i )dyloc{t i+1 )- a(t { )]} 

I 

_ = yAg(<x(t i ))(X(t i+1 ) - X(t,)) + J - y(t,-))|. (7.1) 

Recall that 


where 


a*co — / df 

f(t) = g(a(t))X'(t) + h(a(t))Y'(t). 


We will show (under appropriate hypotheses on a, / and g) that the approximating 
expression (7.1) converges to 

"6 

fdt (7.2) 

Ja 


as the subdivision (and hence the polygonal approximation to our path) gets more 
and more refined. This will prove that the limit is independent of the choice of 
subdivisions. So we wish to compare (7.1) and (7.2) for a fixed subdivision and 
show that their difference tends to zero as the mesh size, max ! (t i+ j — t t .) goes to 
zero. Now we can write 




JT(s)ds 


4 / 




and 


f* 




Y'(s)ds 


_J 

so (7.1) can be written as 



j ( 





9 Wi)) | 

X'(s)ds + h(a(t ; )) 

TOds! 


t J 

while (7.2) can be written as 

X , + 1 f{s)ds = Y,' 
1 J t, i 

f ft 1+ i 

0(a(s))2f ( 5 )ds + 

IJ f, 

i, J 

1+1 /i(a(s))7'(s)ds|. 

>, ) 


The difference between these two expressions is that for (7.1) we have g(a(f,)) or 
K a (h)) occurring outside the integrals in each summand, while in (7.2) we have 
g(a(s)) and h(a(s)) occurring under the integral sign. It is intuitively clear that, for 
/ and g continuous and a smooth, the sum of these differences is negligible for a 
fine enough subdivision. Here are the assumptions we shall make in order to get 
a precise estimate on the difference between (7.1) and (7.2). Weaker assumptions 
would suffice, but require more careful argument. 

(i) We assume t hat g ^a nd h are u ni formly co ntin uous, i.e., that for any o d 
there is an r/ > 0 such that [| P - Q | | < r \ implies that \g(P) - g(Q)\ < g and \ h(P) - 
g(Q)\ < fi. This is an assumption about a>. By the mean-value theorem it will hold 



(with rjl 
noints. 


= g) if the derivatives da and dh satisfy || dg\\<N and || dh\\ < N at all 


U 


for all t. This is an assumption about the path a. 

By the mean-value theorem, we can find a <5 > 0 such that for any t' and t" with 

\t'-f\<5 

we have 

\a{t')-a(t")\<ri. 

Let us choose our subdivision so that its mesh size is less than <5, i.e., | t i+ x — t t \ < <5 
for all i. Thus by (i) we have 

10 (a(s)) - flf(a(ti))| < £ for ti^s^t i + 1 
with a similar estimate for h. Thus 


g(cc{ti)) ' +1 X\t) d t- ‘ +1 0 (a(s) )X'(s) ds 


j t, 

^eM\t i + i - 1 ;|. 


between (7.1) and (7.2) is at most 


l i+l l il 


H+i l i I 


We can arrange to have s as small as we like by making S, i.e., the mesh size, 


fdt= CC*(D 


where the right-hand side here is defined to be the left. We can thus write 


oj = a* co 


In this equation, the left-hand side has an obvious intuitive meaning, while we use 
the right-hand side for computation. 

Example 

As an example of the use of this result, we evaluate the integral I of co = 

- 3nJhr ,— 5 - 




Exact forms 

In this example, ynn wil l note, the value of the line integral depends on the path, not 


just upon the endpoints. This is true in general, but there is one important exception. 
Suppose that the one-form co — df. In this case 


rb 


M^ 


q*(d f) = 


d («*/). 


Ja 


By the fundamental theorem of calculus, 


rb 


d(a*/) = ~ «*/(«)• 


But a*f(b ) = /(a(b)) = f{P b ), where P b is the endpoint of T, and similarly, a *f{a) = 
f(P a ). We conclude that, if T extends from P a to P b , 


df = f(P b ) -f(P a ) 


independent of the choice of T. 

Notice that this result, combined with the preceding calculation, shows that not 
every differential form oo can be written as co = df. Indeed, it is easy to write down 
condition: 




suppose 


df A , df. 


oo — Gdx + Hdy — d/ = „ dx + dy . 

J dx dy ^ 


By the equality of cross derivatives, i.e. since d 2 f /dxdy = d 2 f /dydx, we must have 


dG dH 


dy dx 


In the example 


dG 


dH 




= 2x. 


A differ e ntial form co which can b e writt e n as co = d/ is called exact. 

We will now show that locally (we shall explain what this entails) the condition 

dG _dH 
dy dx 

is enough to guarantee that co = df for some /, determined up to a constant. 

We first choose a convenient point P a and declare that f(P a ) = 0. We then define 
/ by the rule f(P) = j r Qj, where V is a convenient path extending from P a to P. 
Of course, we could add a constant to / without changing its differential df. The 
choice of P a in effect chooses this constant of integration. 

Let us describe this procedure in terms of coordinates. Suppose that 

co = G(x, y) dx + H(x. v) dy. 


For simplicity we may assume P a to b e th e origin; so that 



= 0. The 




( x \ 


most convenient path F joining the origin to the point 

W 

is a straight line 


segment, e asily param e teriz e d by 


( xA 


a: t-> 

( 0<t^l. 

\y l ) 
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Since we are using x and y to describe the endpoints of the path T, we will use x 


and v as names for the dummv variables of integration. Thus 


a *x = xt, a*y = yt. 

c/*dx = xdt, rt*dy = ydt 

and 


ol*oj = G{xt, yt)xdt + H{xt, yt)y dt. 

Then 

p 

f(P)= oc*Q) 

Jo 

so that 

ri 

f(x, y) - [xG(xt, yt) + yH(xt, jzt)] dt {13) 


Jo 


is a formula by which we reconstruct a function / from a>. Notice that this 
construction will succeed only if the functions G and H are defined everywhere 
on the path T. 

So far we have not used any hypothesis on co, other than that it be defined 
along the paths of integration. So we do not expect, in general, that d/ = w. Here 
is where our hypothesis will come in. Let us compute df /dx. By differentiating 
with respect to x under the integral sign in the definition of /, we see that 


df 

dx 


G(xt,yt) d t + 


dG 


dH 


xt~{xt,yt) + ty—( xt ,yt) }dt 


dx 


Now 


d.. , dG , dG 

( tG{xt , yt)) = G(xt, yt ) + tx — {xt, yt) + ty[xt, yt) 


dt 


dx 


-dr 




Paths and ine Integra s 


G(x , v) — 


(tG(xt, yt)) df 


— (xf, yt) + fy — 
ix dy 


Substituting this into the expression for 8 f /dx, we see that 

df f 1 fdH N dG \ 

— = G(x,^ + Jdr. 

Under our assumption 


everywhere, so 


8G _dH 

dy dx 


df 

dx~°' 


(D = df. 


As a n ex p licit example of 


iince (p nTdefmed everywhere, and 


-(2xy 3 ) = 6xy 2 =-(3xV), 
dy dx 

co is a differential, df. We find / by calculating 

'i 

f(x, y) = [xG(xt, yt) + yH(xt, yt)] df 


/(x, y) = I (2x 2 y 3 -f 3x 2 y 3 )f 4 dt = x 2 y 3 


Indeed, 


d(x 2 y 3 ) = 2xy 3 dx + 3x 2 y 2 dy. 


Closed forms that are not exact 


region, a 


segments lying in R, then the condition dG/dy = dH/dx is sufficient to show that 





exact. If G or H fails to be defined at one point in the region, then this conclusion no 
longer holds. 



For example 


co = 


x 2 + y 2 


dx + 


x 2 + y 2 


d y 


satisfies dG/dy = dH/dx, but co is not defined at x = 0, y = 0. In this case there 
exists no function / for which co = df. Indeed, the integral of co around the unit 
circle is easily shown to be different from zero. Take 

«*x = cos t, - «*y.sin t, 

a*dx = — sin tdt, a*dy = cos t 

so that _ 

<x*<o = sin 2 1dt + cos 2 tdt — dt. 


Then 



f 



CO = 

dt = 2n. 

% 

unit J 

*0 


ci rcle 


If co were exact, its integral around this closed path would have to be zero. In a 
later ch apt ers we sh all consider this and related ideas, wh i ch are of great sig ni ficance 
for electromagnetic theory, in detail. 

A form co = Gdx + Hdy defined in some region of IR 2 is called closed if 
dH/dx = dG/dy. If the region is star-shaped we have proved that a closed form is 
exact. In general this is not true. 


Pullback and integration 

Since the definition of the line integral J r cu is independent of any specific choice 
of parameterization for T, it is clear that the calculated value of the integral cannot 




depend on the parameterization. Still, it is wort 
xnlicitlv. Su 








Then there exists a one-to-one mapping oc of the s-line into the f-line, as shown 
in figure 7.19, so that we ma y write the parameteriz atio ns of I" as 


Using the parameter s, we calculate 

- rb fb r a (b) 

0) = = 0L*((i*oj) = P*co 

Jr J a Ja J a(a) 

by the chain rule, and by the change-of-variables formula for ordinary integrals. 

This is exactly what we would have obtained by using the parameter t. As a 
practical matter, this means that using a different parameterization is equivalent 
computationally to making a change of variable in the integral set up by using 
the original parameterization. 

It is also possible to transform a line integral from one plane to another, as 
suggested by figure 7.20. Here ft is a differentiable one-to-one mapping of the path 
T in plane A into a path /?(T) in plane B. Given a one-form co on plane B, we 


dj3 [ v ] 


We claim that 


Indeed, by definition, 


Figure 7.20 


CO = P*(J0. 

) Jr 


p*co= lim X j5MPi)[v ; ] 

N~>oo i = 0 


= lim "XmMPMdftUvJ]. 

N~* oo £ = 0 


But, as N —* oo, the vectors d/?[vj lie along the path P(T) more and more closely, 
so that this last sum, in the limit N -»oo, equals the integral Indeed if we 

parameterize T by the mapping a, we have . 



r i 

r*> 

Cb 


ii 

3 

* 

cc*(B*co) = 

3 

* 

Ts 

0 

2 


j 

a 

1 a 








by the chain rule. But p°ct is a parameterization of 6(D, so this last integral equals 
j P (r ) co. which is what we wanted to prove 



of this last 


we may introduce any convenient 


coordinate system in the plane for purposes of evaluating a line integral. For 


example, if we wish to evaluate the integral of co — x dy — y dx over the unit semi¬ 
circle from to ^ ^ in the xy-plane, we may express the semicircle as 

the image of a directed line segment V in the polar coordinate plane by means of 


the mapping /?: 


r cos 9 


\. Then 


{^9 J \rsin9 / 

(j*a) = ( r cos 9)d(r sin 9) — (r sin 9)d(r cos 9) = r 2 d 9. 



(J*oj = 


d9 = 7i. 


Of c ourse, calculation o f using th e obvious parameterization t 


cos t 


sin t 


leads to exactly the same integral. 

7.2. Arc length 

So far we have considered only directed line integrals, evaluated over an oriented 
path on a plane where no scalar product is necessarily defined. Given a scalar 





rc eng i 




)roduct, it is rtossil 


ylute line integral of 


A w V4.W1111L, LX1C lllLCglclI J Y'J ^^5 vv ^ I^ivuiv hk/ pcil.ll X llllAS auuil 

segments, with v t - = P i P i + 1? then take the length of each segment by using the scalar 
produ ct: s t = || y. | = s /(\ h v,). The integral is again defined as the limit of a sum: 

fds= lim £ f{Pi)Si. 

J JV-* oo i = 0 

Clearly in this case the orientation of T does not matter, since the length of v ; is 
the same as the length of — \ t . 

To evaluate an absolute line integral, it is again convenient to parameterize T. 
We write 

a*x = X(t), oc*y = Y(t ) 

so that 

a*dx = X'{t ) d t, ot*dy = Y'(t) dt. 

Then the vector u ( - = r £ t i + 1 is mapped into the vector 



Figure 7.23 


By definition 


so that 


I da[uj || = y/ {X^tj) 2 + Y'(f;) 2 ) df [u { ]. 


m) =/(«(f *))=«*m 


fds= lim £ a *f(t i ) y J{X'(t l ) 2 + Y'(t i ) 2 )(t i+1 - t t ) 

N-* oo i — 0 


which may be recognized as the integral 


/ ds= a*/(f)V(^'(t ) 2 + r(t) 2 )dt. 

Jr Ja 

If /= 1, the integral jds defines the length of the curve T. More generally, f(P) 
might represent the linear mass density of a thin wire in the form of the curve IT 


scalar product is not 



e calculation of the proper time associa 





with the world line of a moving particle, which is the elapsed time as measured 
a clock moving with the narticle. In this hasp since t always increases alon 
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Now 


of the vector 




7(1 -v 2 )dt 


as the integral which defines proper time. 


Summary 

A Line integrals 

You should be able to explain the meaning of a line integral of the form J r m and to 
list and apply properties of this integral. 

You should know the prescription for evaluating a line integral by pullback, and 
you should be able to introduce appropriate parameterizations for evaluating line 



B Differentials and differential forms 

iven a one-form co defined on a sfar-shaned region of t he plane, you should be able 



i^xa iltuTSiTkriRilfBlTTTruimtBi 


r*P.nniKH*«anw 


ion f it one exists 


Exercises 

7.1. Let co = (ycosxy + e*)dx + (xcosxy + 2y)dy. 


(a) Evaluate J r co along the segment of the parabola y — x 2 from to 
( Y Use the parameterization <j) described by the pullback 


(b) Evaluate j r co for the case where T is the straight line joining the origin 


to the point y J. Do the same for the case where F consists of the segment 

0^x^a on the x-axis, followed by the segment x = a, 0 < y ^ 

(c) Find a function /(x, y) s u ch that co = d/. _ 

^fa) Evaluate j r co along the p arabola y defin ed by 
_ ^ X l=f f Y for O^tCl. 


(b) Find f(x,y) su c h that co = d/. 
7.3. Let co = ydx — xdy. 





-1 



(a) Evaluate 

\ y (D along the semicircle y from | 

_o J 

1 to l 

i o J 

| defined by 



( 

i_j 

( — cos t\ 



Q 

N 

\ sin t y 



for 0 < t < n. 

(b) Show explicitly that you can obtain a different value from that in (a) by 
choosing a different curve joining to I I. 


7.4. Let co = (15x 2 y 2 — 3y)dx + (10x 3 y — 3x)dy. Evaluate J r co, where T is the 
path from (— 1 , 0) to (1,0) along the semicircle x 2 + y 2 = 1 , y > 0. 

7.5.(a) Evaluate j r co, where 

co = dx + 2xdy 

and T is the segment of the parabola x = 1 — y 2 between y = — 1 and 
_v = + 1, as shown in figure 7.25. 







property that co(v) = 0 for any vector v which is tangent to one of the 

curves. 


( 1 ^ 


1 hunt: If y — F(x), v — | 

l rvvi 1 

|, and you have a differential equation for 

\ 

V-'V/ 



7 


(c) Find functions f(x,y) and g(x,y) such that df = geo. 


(Hint: / must be constant along the curves which you found in part (b).) 
7.9.(a) Sketch the semi-ellipse described by the polar equation 


9 

r = - 

5 — 4 cos 9 


for 0 < 9 ^ n. 


Recalling that x = rcos9,y = r sin 9, show that this semi-ellipse is part of 
the graph of 

(x — 4) 2 y 2 

-- + — = 1 . 

25 9 


(b) Express the differential form 


co 


xydy—y 2 dx 
x 2 + y 2 


i n terms of polar coord i nates ( in terms of r, 9 ,dr, and A9) . 


(c) Evaluate j r co, where V is the semi-ellipse of part (a), using polar 
coordinates. The coordinate 9 makes a convenient parameter. 


(d) Evaluatej'rcobyusingxandy ascoordinates. A convenient parameteriz- 
— atronistheonedefmedbythemappin^ - 


/4 4- 5 cos 




\ 3 sin t / 



7.10.(a) Suppose that u and v are curvilinear coordinates on a region £) on the 


plane, with the Jacobian 


' dujdx _ 8u/dy\ 


Det 


jjv/Sx dv/dy) 


nowhere zero on D. Let co be a smooth differential form defined on D, let T 
be a curve in D. Show that J r co has the same value whether co and T are 
expressed in terms of x and y or in terms of u and v. (The preceding 
problem was an example of this result.) 

(b) Let T be a closed path described in polar coordinates by p — F{9), with 
F(9) > 0 and F(2tz) = F(0). Show that the area enclosed by this closed path 
equals J r co, where a> = \p 2 d9. 

(Hint: Try expressing co in terms of Cartesian coordinates.) 

7.11. The state of a gas confined to a cylinder can be represented by a point in a 
plane. In terms of coordinates P (pressure) and V (volume) on this plane, 
the quantity of heat absorbed by the gas during a process represented by a 
path T in the plane is Q = J r co, where 

co = jPdV +1 VdP. 






(b) For these integers find a function / such that d f 

= x m y n a). 

(c) If we map the tu>-plane into the xy-plane so that 


/_.\ / -.2 , Q \ 





w 

1 + 2a / 

what is thf> rmllhar'L' r.i*) 

1 ” 

(Q\ /A 

(d) Calculate Jco over the path T, and F, connecting j 

and where 

A i-i —1- 


Tj goes in two straight segments via f \ and r 2 in two straight 


segments via 


0 s 

A, 


(e) Evaluate the absolute arc-length integral J Pl || co || ds. 

(You may leave one term of your answer in the form of an ordinary 
definite integral.) 
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Chapter 8 continues the study of integral calculus and is 
devoted to the study of exterior two-forms and their corre¬ 
sponding two-dimensional integrals. The exterior derivative 
is introduced and invariance under pullback is stressed. The 

theorem, is proved. Surface integrals in three-space are 
studied. 


— 84 r .-Exterior derivat i ve - 

We have already seen how the differential of a functio n / p rovides the best linear 
approximation to the change in the value of/ as we move from a point P to a 
nearby poi nt P + v. To be specific, 

d/(P)[v]=/(P + v) ~/(P) + o(v) 

where the error, o(v), goes to zero faster than the length of v if v is made small. 
We can think of d/ as a linear function whose value on the vector v is determined 
by the values of the function / itself on the boundary (endpoints) of the segment 
defined by v, in the limit where the vector v becomes very small. 

Using a similar approach, we can construct from a one-form t a two-form dr, 
called the exterior derivative of t, which is, at each point P, a bilinear function of 
two vectors v and w: Given a point P, an ordered pair of vectors v, w, and a one-form 
t, we can obtain a number by integrating r around the parallelogram spanned 
by v and w, moving ‘forward’ from P along the first vector v of the pair, eventually 
backwards along the second vector w. If the vectors v and w are small enough, 
and t is reasonably well-behaved near P, then we expect the value of this integral 
to be approximately bilinear , i.e., to depend approximately linearly on v (for fixed 
w) and linearly on w (for fixed v). Denoting the parallelogram spanned by hx a nd 







Figure 8.1 

few by P (h, k ) we would like to define dt (P) by 

r = hfedt(P)[v, w] 4- error (8 1) 

JP(h,k) 



where (we hope) the error term goes to zero faster than h as h->0 (with fe fixed), and 
also goes to zero faster than fe as fe -»0 (with h fixed) . 


the proof that 


the differential of a function is unique. Suppose that equation (8.1) holds for two 
differe nt bilinear fun ct ions dr and dt. The n , letting a denote the difference 


dt — dt, we would have 

0 = hka[\, w] 4- error. 


Dividing by hk and letting h approach zero, we find 


0 = cr[v, w] 4- - lim (error/h). 
fefc -0 

But the error approaches zero faster than h, so er[v, w] = 0. This proves that <t[v, w] 
is the zero function, so that dt cannot be different from dt. 


We turn next to the problem of calculating dt(P) and proving that it exists. For 
simplicity, we assume initially that t is of the form/dx, where/ is twice differentiable 
everywhere near P. Here dx is the form which assigns to every tangent vector its 






tribution from one side of the parallelogram, say the side from P to P + hv, j s 
found by using the parameterization + r/ 7 V so that the contribution is 


P + £w 
t= If"" 


P + h\ 


Figure 8.3 


The contribution from the opposite side, from P + h\ + /ew to P + /cw, is similarly 


/idx[v] /(P + kw + th\)dt. 


hdx[v] 


-/(P + ffrv)]dt. 


Since / is assumed twice differentiable, we may apply Taylo r ’s formula 
/ (p 4- thy + few) -/( p + few) = d/ (p+fftv) [fcw] + 0(k 
to write this last expression as 


'(P + rAv)l 


From the other two sides of the naralleloeram we obtain terms which combine 


similarly to give 


+ kdx [w] d/ (P+(Aw) [hv]dr + 0(h 2 k). 


Substituting these results into the integral around the parallelogram, we get 


t = hk | - | d/ (P+tftv) [w]dt-dx[v] +' d/ (P+fkw) [v]dt-dx[w] 


.wv ^V(P + ffcv)L"J 

J P(h,fc) L Jo 

+ 0(h 2 k) + 0(hk 2 ). 
Now d f {P+thy) is just the row vector 

- 


on, me partial derivatives o 










value theorem 


'(P + th\)l 


)on integrat 


d / ( p +t *v) [w] di = d / P [w] + 0(/i). 


Substituting into our integral around the parallelogram gives 


t = hfe(d/ P [v]dx[w] - d/ p [w]dx[v]) + 0(h 2 k ) + 0(h/c 2 ). 


We thus get our desired expression (8.1) if we set 


dt P [v, w] = d/ P [v]dx[w] - d/ p [w]dx[v]. 


We see that dt is an antisymmetric function of its two arguments: dt[w,v] 
- dt [v, w],_ 




G A X) [v. w] = cr[v]A[w] — o-[w]A[v' 



where a and ), are one-forms, v and w are vectors. Th 

dtp [v, w] = (d/ p a dx) [v, w' 

or, more concisely, 


write 


From the definition of the wedge product it is clear that 


X A G — — G A/t, 

i.e., the product is antisymmetric. In particular, g a g = — a a a = 0: the 
wedge product of any one-form with itself is zero. 

It is also apparent that 

(<7 + CO) A X = (<7 A A) + (CO A X) 

the wedge product is distributive with respect to addition. 

Consider now the most general one-form/dx 4- gdy. The same argument applied 
to gdy will lead to 

d{gdy) = dg a dy. 

Since the integral of co is linear in oj, we get 











as can also be verified directly from the definition of d. But we may express d 
and da in terms of dx and dv: 


% . . dg 


>ince dx a dx = U and dy a dy = 0, we fine 


dr = — dv a dx + -^-dx a dv. 
dy dx 


Finally, since dy a dx = — dx a dy, we have 


dt= (^yi) ixAdy 

\dx dy J y 


As an example, let 


Then 


t = x 2 y 2 dx + x 3 ydy. 


dt = d(x 2 y 2 ) a dx + d(x 3 y) a dy 

= (2xy 2 dx + 2x 2 ydy) a dx + (3x 2 ydx + x 3 dy) a dy 
= 2x 2 ydy a dx + 3x 2 ydx a W = x 2 ydx a dy. 

This is o bvious from the def inition of drTsince drfv, wj is the best linear appro xima- 
tion to the line integral jr around a parallelogram, and since the integral of a 
diff e r e ntial around any clos e d path is z e ro, cl e arly d(d <j>) = 0. Alternatively, we may 
prove the same result by direct computation: 


^ ^ ^y ^ I - - H UJ -- \j 

_ oy ox _ ox ay _ 

because of the equality of mixed second partial derivatives. Thus we see that the 
condition for a form to be closed is precisely that 



We have shown that, if / is differentiable and if x is the coordinate function, 
then d(fdx) = df a dx. We now use this result to prove a more general product 
formula 

d(/t) = d/AT+/dt 

where / is a differentiable function and t a differentiable one-form. Writing 
t = gdx + hdy, we have ft = (fg)dx + ( fh)dy, so that 




leretore 


a dx 


a (gdx 


d(/r) = d/ a T +fdr. 


8.2. Two-forms 

Since the most general one-form in the plane has the expression fdx + gdy, the 
most general product of two one-forms will be some function multiple of dx a dy. 
We call such an expression a two-form so a two-form looks like 

<7 =/dx a dy — F(x, y)dx a dy. 

We want to think of the value of the two-form at P, i.e., F(P)dx a dy, as a rule 
which assigns numbers to pairs of vectors. 

To understand the ‘constant’ two-form dx a dy, we first evaluate it on the pair 


^y>- 


= dx[e„ 


e., - dx 


= IT — 0-0 = 1. 


Figure 8.4 

More generally, 

dx a dy[he x , kef\ = hk. 

Clearly this is the area of the rectangle defined by the vectors he x and ke y , in units 
where the rectangle defined by e x and e y is taken to have unit area. 

More generally still, we can evaluate dx a dy on an ordered pair of vectors (v, w). 
We may write \ = ae x + ce y , w = be x + de y , so that, in terms of the matrix 

A = ( a T, v^TYw^^YThen 


V w= TT Then 

dx a dy[v,w] = dx[v]dy[w] — dx[w]dy[v] 

= ad —be 






- double integrals 

A general two-form a i s a function of three variables p,v, a nd w. For fixed p i t 
is a bilinear and antisymmetric function of v and w. If a = /dx a dy then 

- g(p)(v, w) = /(p )DetT- 

where v, w and A are as above. We can think of c(p) as assigning a notion of 
signed area to each parallelogram based at p. The signed area of the parallelogram 
spanned by v and w (in that order) is <r(p)(v, w). 

It is important to remember that the value of dx a dy on a parallelogram depends 
on the orientation of the parallelogram, as determined by the ordering of the 
vectors which define the parallelogram. On the oriented rectangle which corres¬ 
ponds to the pair [e x , e y ], the value of dx a dy is +1, while the value of dy a dx 
is — 1. On the same rectangle with opposite orientation, which corresponds to the 
pair [e y ,ej, the value of dy a dx is +1 but the value of dx a dy is — 1. More 
generally, to evaluate a two-form t on a parallelogram defined by v and w, we 
look at the orientation to determine which vector, v or w is ‘first’, then evaluate 
t[v,w] or t[w, v] as appropriate. 
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t 

-\ 


*y 

—f 




— 






d* A dy [e*, e y ]~ +1 dx A dy [e^, e*] = -1 

Figure 85 


Or 



Since a two-form t assigns a number to each small oriented parallelogram (pair 
of vectors) just as a one-form assigns a number to each small directed line segment 
(vector), we can integrate two-forms over a region R in the plane much as we 
integrate one-forms along paths. Given a rectangular region R, oriented as shown 





— 
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Figure 8.6 




Integrating two-forms 


279 


in figure 8.6, we break it up into N X N small rectangles, then form the Riemann sums 


N X ~ 1 Ny ~ 1 

b — a d — c 


= y I <np,j) 

AT \r c * 

• 

O 

II 

o 

.'i 

_ J 



o = F(x , y)dx a dy, 

b — a d — c N ^7 1 Ny y 1 
Vy N x ~/v7 i = o j=o 


We then define the integral of the two-form x over the oriented region R as 

x— lim « 


Jr n x ->co 

Ny~> 00 

provided the limit is independent of the refinements of the partition. 

We may evaluate the double integral of F(x,y)dx a d y over the rectangle R as 
an iterated integral. To evaluate the expression 

b — ad — c N x- 1 JVy_1 

I= lim W £ I 

N x -*co M x M j= 0 j = 0 

Ny-00 

we may first sum over j for each fixed i, then let AL -> oo before summing over i. Since 




b- a N x - 1 fd 

lim - — X F(x,-, y) dy. 
N x ->co M x i = o J c 
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F(x,y)dy 

dx 

c 

f 7\J 

c y 



an iterated integral which, can be evaluated by techniques of single-variable calculus. 
We could equally well have summed first over i, then over j , to obtain 

'd / f .b \ 

/= F(x,y)dx)dy. 


In evaluating the integral of a two-form x over an oriented rectangle R , we must 
pay attention to the orientation of the rectangle. If R is oriented so that x is the 
‘first’ coordinate, y the ‘second’, as in figure 8.7(a), we write x = F(x, y)dx a dy, 



a. 


(b) 
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Figure 8.7 (a) Oriented with x first, (b) Oriented with y first. 


then evaluate the iterated integral. 

f ( f F{x,y)dy]dx or 


d / Cb 


I F{x,y)dy\dx or I F(x,y)dx Idy. 

Ja \Jc / Jc \Ja / 

If, on the other hand, R is oriented so that y is the ‘first’ coordinate, as in figure 
8.7(b), we must write 

t = G(y, x ) dy a dx —(where G(y, x) — — F(x, >•)) 

and evaluate the iterated integral _ 

~ 'b / rd V rd/ fb -v- 

( G(y, x)dy dx—or { —G(_y, x)dx d^— 


Reversing the orientation of R changes the sign of the integral. 


wnen we reverse tne orientation - is intuitively clear, (r or example, in tne case 
the force field, the line integral gave the work along the path, a difference in ener 


■ »1 >■'/>! I ill 
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E b — E a with E a — E B .) It is important to have a similar intuitive example for our 
two-dimensional integrals. Here is one: One way of visualizing a change in orienta¬ 
tion in the plane is to look at it from above and from below. That is, suppose 
that we imagine our xy-plane as being the z = 0 plane in three-dimensional space. 
Then a rotation which is clockwise when viewed from above will appear counter¬ 
clockwise when viewed from below. So choosing an orientation on a surface in 
space is closely related to choosing a ‘side’ of the surface. Now imagine that 
material is flowing through the surface. For instance, imagine that the surface is 








a piece of a cell membrane and we are interested in the transport of a particle 
ion across the membra ne. 

Then, of cours e , in using the word ‘across’ we must specify a definite choice of 
direction - a definite ‘side’ regarded as ‘in’ - for the surface. Thus in measuring the 
total flux across the surface, we must choose an orientation. Changing the orienta¬ 
tion wi ll change the sign of the total flux . 


Double integrals 

Frequently one encounters absolute double integrals, which are to be evaluated 
over a region in the plane which has no orientation. If, for example, a represents 
the density (mass per unit area) of a plane lamina in the shape of a rectangle R, 
then the mass of the lamina is given by the double integral 


M — 


o&A. 


R 


Clearly M must be a positive number; orientation of R cannot matter. We may 
regard d A in such an integral as a function which assigns to any small parallelogram 
its true geometrical area; that is, the absolute value of its directed area. If we are 
using x an d y as c oordinates, we may write d A = dx dy or d A = dy dx; the 
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Figure 8.9 


order of the coordinates does not matter. The absolute integral j R F(x,y)dxdy may 
be evaluated as the iterated integral 


'b 

Ja 


d 

F(x,y)dy 


dx 


or as 


•d 

c 


F(x,y)dx 


dy. 


The important point is that there are two quite distinct types of geometric 
objects - expressions such as odA, which we may call densities, which assign 
numbers to regions R by integration independent of any orientation - and two- 
forms - expressions like t = Fdx a dy, whose e valuation depends on a cho i ce of 



orientation. They are each appropriate in quite different physical contexts. As we 

ntlv. under change of variable s or pull-back. We sha ll 
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Double integrals as iterated integrals 


ouble in 




terated intes 


bounded by lines x = constant and by function graphs which do not cross. For 
example, in the region in figure 8.10, bounded on the left by x = a, on the right by 


y = <P(x) 


Figure 8. 


integral j R F{x, _ 


—— if /(fKp HahKIp 

uic uouuic 


s an illust ration, we calc ula te the integral / = l’ R 2xydx dy ov er th e quarter-circle 







and the line y = x. This may be evaluated as 

r 

•i 

( p* \ 

f 1 | / 

*4_ 

1 = 


vdv dx= 

riy 2 l x , (Jx — — 

(x 2 — y 4 ) dx = yt. 


0 

U* 2 ’/ 

Jo 

0 


Aitcriiaiivciyj wc niEy ocscnoc me line tis x — j/j tile pctr£iDoi3. 3,s oc • ^ j/j cind 

integrate first over x: 

pi 


rjy \ r 

•i 

pi 

1 = 

J o 

t 

II 

>> 

T3 

T3 

ls/y-y]ydy = 
o 

( y 3/2_ 3; 2)dy = _L 

Jo 


Sometimes it pays to regard an iterated integral as a double integral in order 
to reverse the order of integration. For example, the integral 


1 = 


e y2 dy^dx 


is unattractive to evaluate as it is written. We can, however, convert it to the double 
integral 


1 = 


e y2 dxdy 


R 


where R is the triangular region bo unded by x = 0 , y = 1, and the line y = x. This 





double integral can be evaluated by integrating first with respect to x, then with 
respect to y: 


so 


1 = 


1 = 


Jo 


dx ) dy 


ye-' 2 dy= ^~ u du = i(l -e' 1 ). 
jo Jo 

Incidentally, the original integral can be evaluated as it is written. If we define 
an antiderivative of e _y2 by 



py- 

— f2 i 

G(y) = 

e at. 

• 

0 







so that 



rr’( Y ) = r.-y 2 . 
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(G(l) — G(x))dx._ 
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^vlow miegraiion oy parts yields 


I = [(G(l) - G(x))x]> - 


r*i 


x( — G'(x))dx. 


The first term vanishes at both limits. Since G'(x) = e * 2 , we find that 


/ = 


xe * 2 dx=^(l— e 1 ), 


exactly as before. 

Sometimes, in order to evaluate a double integral in terms of integrals over x 
and y, it is necessary to divide up the region of integration. For example, to 



Figure 8.15 


evaluate jF(x, yjdxdy over the circular sector shown, we first divide the sector 
into regions jR x and R 2 , then evaluate 

F(x, y) dx dy + F(x, y) dx dy 

» Ri * Rz 

by converting each integral to an iterated integral. A more natural way to evaluate 
the same integral is to introduce polar coordinates. We shall discuss this important 
problem of change of variables in section 8.5. 


8.4. Orientation 

We have seen that the sign of a line integral depends on the orientation of the 
path and that of a two-form on the orientation of the plane. We hope that you 


h a ve an intuitive ide a of what orient a tion me a ns, but suspect th a t you might feel 
the need fo r a precise math e matical definition. That is ou r purpose in this section. 





Before plunging into abstract mathematical definition, let us consider the problem 
In the plane, for example, it i s intuitively clear that there are two possible 
orientations: 



x X 

Figure 8.16 


We cannot intrinsically characterize one or another but do know that they are 
different and that there are only two of them. Similarly for the line: 

-►- -◄- 

Figure 8.17 

or for three-space when we try to describe right- or left-handedness: 



Figure 8-18 

Gettin g back t o the plane, we do know (see section 1.5) that a nonsingular 
matrix A preserves or reverses orientation according as Det A is positive or negative. 
This provides us with the clue that we need for the general definition:* 

Let V be an abstract two-dimensional vector space. As we saw at the end of 
Chapter 1, giving a basis of V is the same as giving an isomorphism L: V-* IK 2 . If 
L and L are two such bases, then 

L' = BL 




r t -r 


where B is a nonsingular 2x2 matrix. Let us call L and L similar if Pet B > 0 
and opposite if Pet Ft <r n We cl ai m that the set of all bases of V decomposes into 
two collections; call them , ( F 1 and #~ 2 . All bases in the collection are mutually 
similar, as are all the bases in the collection J* 2 ; and every basis in the collection 
SF x i s o pposite to every basis in the collection J^ 2 . Indeed, pick some b asi s L,. 
Let x consist of all bases of the form 

BL X Det B>0 

and let J^ 2 consist of all bases of the form 


BL X Det B <0. 

Every basis must belong to one or the other of these collections. If L and L both 
belong to 3F then 


L=BL U L = B'L x Det B>0, DetB'>0 
so 

L = B’B~ l L and Det B'B~ 1 = Det B’ (DetJ3)~ 1 > 0, 


so L and L are similar. If both Det B < 0 and Det B' < 0 in the above, we shall 


get that L and L are similar; while, if one of the determinants is positive and the 


o ther n egative, we 
collection of all bases of V, we have 


opp o site. Thus, if we 


F = ZF x u .#~ 2 , _ & x r\ , F 2 — 0 ._ 


An orientat ion in V is defined to be a choice of o ne or the other of t hese two 

orientation on V is defined to be a 


bases of V s uch that 


similar to a basis in the collection is in the collection. 


Notice that giving a basis, L, of V determines an orientation on V— the set of 


all bases similar to L. 

Once we have chosen an orientation on V, then a basis L will be called good 
or positive if it belongs to the collection and bad or negative if it does not. Thus 
once we have chosen an orientation, every basis is either good or bad. (Of course, 
if we had chosen the opposite orientation, these appellations would be reversed.) 

Let W be a second two-dimensional vector space and let A: V-> W be a linear 
isomorphism. That is, A has an inverse A~ 1 :W-*V. Suppose that we have chosen 
an orientation, (9 V on V and an orientation (9 W on W. Let Me& w be a good basis 


v -► w 





of W. Now M: W-> IR 2 so M°A: V-> U 2 is an isomorphism: hence is a basis of V. So 
there are now two poss i bi l iti e s ; M°A is either good or bad. We can put this 
alternative another way. Let L b e a good basis of V. 



Figure 8.21 


Then C = MAL 1 is the change of basis matrix between MA and L; in other words 

MA = CL. 

So MA is good if and only if Det C > 0. If we replace L by L = B l L(DetB 1 > 0) 
and M by M' = B 2 M( Pet B 2 > 0), then_ 

c = M 'AL~ l = b 2 mal~ 1 b; 1 = b 2 cb; 1 

_so 

Det C = DetH ^DetG 


has the same sign as Det C. Thus the question of whether Det C is positive or 
negative is independent of the particular choice of Le(9 v and Me(9 w . 

If Det C > 0 w e say th at A is orientation-prese rving (or po sitive). If Det C < 0 


we say that A is o n entation-reversing (or negative)^ Supp ose that V-*W and 


W-*Z are two linear isomorphisms and we have chosen orientations on each of the 

three spaces. Then it is easy to check that _ 

If A and A' are orientation-preserving, so is A'°A; 

If A and A' are both orientation-reversing, then A'°A is orientation¬ 
preserving; 


If one is orientation preserving and the other is orientation reversing, then 
A' 0 A is orientation reversing. 

Let <j>:V->W be a differentiable map. Then, at each pe V, d 4> p : V-*■ W is a linear 
map. We say that cf) is orientation preserving if d 4> f is an orientation preserving 
linear map for every p. (In particular, we assume that d (f) p is a linear isomorphism 
for each p.) 

In our definition, we have assumed that V was two-dimensional. This is irrelevant. 
For example, if V is three-dimensional, the same definitions work. We merely need 
to know that a 3 x 3 matrix is invertible if and only if its determinant is not zero 
and that any three linearly independent vectors in a three-dimensional space V 




form a basis of V -hence an isomorphism with U A . Then the discussion at the end 

ions in this se 
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determinant of an n x n matrix. We will discuss these topics in Chapters 10 and 
11. ( In one dimension, a basis is just a non-zero vector, 1 x 1 matrix (a) is just a 
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8.5. Pullback and integration for two-forms 

The usual motivation for introducing new coordinates in a double integral is to 
simplify the integration. For example, the region W shown in the xy-plane can be 
expressed as the image of a rectangle R in the r#-plane by making the familiar 
polar coordinate transformation a defined by a*x = rcosd, a *y = rsin6. We can 


‘d a*r 

D*—7^ 



Figure 8.22 

use this transformation to convert a directed double integral j>T into the integral 
of a su itable two-form in the rfl- plane. This is achi eved by defining the pullback 


r= I arx. 


ranu 35 T 9 ! 


RwnTmToii 


f the pullback of 









rectangular region R, provided of course that a is differentiable and orientation 
preserving and that both integrals exist. W e a pprox im ate j R ° c * x as a Riem ann sum 
over many small rectangles: 



Figure 8.24 


The contribution of the rectangle at point p is a*t(p)[hv,/cw]. By definition, this 
equals r(a(p))[da[/iv], da[/cw]], that is, the value of x on the parallelogram which 
is the best linear approximation to the image under a of the rectangle defined by 
h\ and kw. 

Of course, the image of the rectangle under a is not precisely a parallelogram 



Figure 8.25 


and th e value 

_i(q(p))[d«(/iv), da(/cw)~_ 

doesTnot precisely eq ual the integral of x over the image o f t he rectanglerSo we 
make two types of error: replacing the image a(rect.) by a parallelogram, call it P, 
and so 

® L(,eo,.| by Jp> 

then 

(ii) replacing J P by t(a(p)) [da(/iv), da(hw)]. 

Now, if x is continuous, the error involved in (ii) is clearly o(hk): if x had uniformly 
bounded first derivatives on the entire region bounded by k say, then 

l T (q) — T ( a (p))l ^ K (h 2 + k 2 ) 112 for any q in P. 

Thus the error involved in (ii) is at most 

K (h 2 + k 2 ) 1/2 hk. 

The error involved in (i) can be estimated by Taylor’s formula; for example, replacing 
the curved image of each side by an approximating straight line. The error here 
(assum i ng the first and second derivatives of « are bounded over R ) will be a sum 



of terms bounded by h 2 k and hk 2 multiplied by a suitable constant, thus 

_ /* _ 

«*fr(p t ))[hv,ftw]= T + error 

J a(rect.) 

where _ 

- (error | -c-Cffo 2 —l-fc 2 )— x (the area of R). 

Summing over all rectangles, we get 


where 


£a*(T(p ; ))[/iv, tow] = t +error 

Ja(R) 


| error | < C(/i 2 + /c 2 ) 1/2 x (the area of R). 

As we make the mesh finer and finer, the sum on the left approaches j K a *r while 
the error on the right approaches zero. It follows that 
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first calculate a*(/l a oj, where X and a are one-forms. By definition, 

a* (X a <r)[v, w] = {X a g)[da[v], da[w]] - /l[da[v]]g[da[w]] 

— A [da [w] ] a [da [v]". 

On the other hand, 

(a*X a a*cr)[v,w] = a*2[v]a*cr[w] — a*/l[w]a*(j[v]. 

By the definition of pullback for a one-form, a*l[v] = A[da[v]]. It follows then that 


a*{X a a) = ol*X a a*a 


that is, pullback commutes with the wedge product. 

Since the most general two-form in the xy-plane is of the form 

r =f dx a dy 

we find immediately that 

a*t = (a*/)d(a*x) a d(a*y). 

If, for example, a* x = r c os 9, a*y = r sin 6, t hen_ 

d(a*x ) a d(a*y) = (cos 6 dr — r sin 6 df?) a (sin ddr.+ r cos 6 dd) = r dr a d 0. 






coordinates: 



(a *F)(r, 6)rdr a df) = 

/* 

F(x, y) dx a dy. 

• i 

« _”_'_J 

MR) 


The two-form rdr a d9 assigns to any small parallelogram in the rfl - plane not 
its directed area (dr a dd does that) but rather the directed area of its image In the 
ry-plane under the transformation a. 

We can now establish the general change of variables formula for directed double 
integrals. Let R be an oriented region in the wu-plane which is carried by the 
differentiable transformation a into the oriented region a (R) in the xy-plane. We 



describe a by specifying the pullback of the coordinate functions x and y: 


a*x = X(u,v), a *y=F(it, y) 

so that 

Ov/ vliCl l 

. ale \_ \_rv ^ V" 

■ t dtorxrzo(orxr~ dX ~ da , 

H ( y\ :—- c\u -H v —- (\n -I--H i) 

Ullt A J _ Uw — { vi. U - UW f . ut/) 

du dv du dv 



a (a y) — cut h av= au + ay 

du du du dv 


The two-form t on the xy-plane mayT)e expressed as t — f dx a dy = F(x, y) dx a dy. 

Its pullback is 



d(a*x ) d(oc*y ) d(oc*x) d(a*y) 


a*t = a*/d(a*x) a d(a*y) = a*/ 
or, equivalently, 


du dv 


dv du 


d u a dv 


a*t = F(X(u, y), Y(u, y)) 


[ 


8X 8Y dX dY' 


du a dy. 


du dv dv du 

We recognize the factor in square brackets as the determinant of the Jacobian 
matrix J which represents da relative to the given coordinates, 


./ = 


d(a*x) d(a*x) 
du dv 


\ 


[ dX_ dX_\ 
du dv 


d(oc*y ) d(a*y) 


dY dY 


du 


dv / 


du dv 





so that we may write 



(% 

r/* fTiet.J du A dy. 


MR) . 

R 


This is entirely reasonable. Since Det J is the ‘ar e a-transforming’ facto r fo r t he 
linear transformation da, Det J du a dr assigns to any small region in the ur-plane 
the direct ed area of its image in the xy-plane. If the ordering of u and v has been 
determined by the orientation of R and the ordering of x and y by the orientation 
of oc(R), and if a is orientation-preserving, then the Jacobian matrix J will have a 
positive determinant. Reversing the order of u and r, or of x and y, corresponds 
to a change in orientation in the uv~ or xy-planes. It will interchange columns or 
rows of J and thereby change the sign of Det J. 

As an illustration of the change of variables formula, we calculate the area of 
the oriented region W bounded by the x-axis, the line y = mx, the hyperbola 
x 2 — y 2 — 1, and the hyperbola x 2 — y 2 = 4. To achieve this we write W—a(R) 
where a*x = u cosh v , a*y = u sinh y. Then a maps the oriented rectangle R, defined 
byl ^u<2, arctanh m, into the region W. For example, the vertical segment 

u = 2 is mapped into a portion of the hyperbola x 2 — y 2 = 4. 



Since IT has the ‘x-first’ orientation, its area is A = j^dx a d y. Pulling back, we 
have A = j R a*(dx a dy) = DetJdw a du. Here a*(dx a dy) = d(w cosh y) a 
d(u sinh v) = (cosh vdu + u sinh v dy) a (sinh vdu + u cosh v dv) = u cosh 2 ydu a dy + 
u sinh 2 y dy a du = u du a dy so that 




<*2 

A = 

udu a dy = 

udu 


R 

i 


dy = 4tanh 


-1 


m. 


Equivalently, we may compute 








Clearly the secret of a useful coordinate transformation is to make the boundary 
of the region W be the image of the sides of a recta ngl e in the u/;-plane. If, for example, 
W is the triangle bounded by the coordinate axes and th e line x + y — 1, a useful 
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coordinate transformation will be one which carries the lines u = constant, for a 
fixed interval in v, into segments x + y = constant between the coordinate axes. Such 
a coordinate transformation is described by oc*x = uv, oc*y = u( 1 — v), which has the 
property that a*(x + y) = u. You should convince yourself that a carries the unit 
square R in the uu-plane i nto the re gi on W, but that R must he given 
orientation in order to make the orientation of y.(R) 

Confirmation of this fact is that, when v is taken as the first coordinate, the Jacobian 

d(«*x) — d(«*x) \ 


J = 


dv 


du 


d(oi.*y ) — 8(<x*y) 
du f 


~dv 


u 


r- U -f 


equaT —u.) 


We may use this coordinate transformation to evaluate the integral 


e - (Jc + y) 


/ = 


dx a dy 


' W y/(xy) 

which would be very difficult as an iterated integral over x and y. We find 


' e -(x + y)\ 


a 1 


y/(xy) ) yjlu 2 v{l-v )] » 

and a*(dx a dy) = (udi; + vdu) a (— udv + (1 — v)du) = udv a du, so that the integral 
in the im-plane is simply 


1 


1 = 


r e-“ 


R JW -y)] 


dv a du. 


Note that, because R has the ‘u-first’ orientation, we write 


dv a du, not 


70(1 -y); 


its negative, before converting to an iterated integral. The final result is / = 




l )n. (In evaluating th e second integral we used the 


fac t t hat f— 


dv 


VW 1 - v)'_ 


— arcsin( — 2v + !)•) 


W e turn finally to the, question of changing variables in an absolute double 
integral, I = \ w f dxdy. To make such a variable change, we may first convert / to a 
directed double integral J = \ w f dx a dy, giving W the ‘x-first’ orientation. We next 



Figure 8.30 


write W as cx(R); this procedure assigns an orientation to R. If Det J ( ) is positive, 
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I = 

a*y (Det J)(dy a dn) = — 

— oc*f (Det J)dudv. 

% 

(vfirst) J 

(unoriented) 

In either case, the rule is to use the absolute value of Det J: 




_ J _ 



x,y \ 

j 

1 — 

a j 

R 


V U, V ) 

du av. 


When this rule is used, questions of orientation, or of the order of coordinates, never 
arise; interchanging x and y, or u and v, does not affect the absolute value of Det J. 
In Chapter 15 we will discuss integration of forms in higher dimensions. 


8.6. Two-forms in three-space 

In the preceding section, we defined pullback for two-forms. The computational rule 
was very simple: if oq and co 2 are linear differential forms, then 

4>*(<C0 1 A C0 2 ) = 0*0)! A 0*CO 2 . 





If x t and t 2 are two-forms, then 


^*( r 1 + t 2 ) = <ft>*x r + 4>*t 2 - 


In short, 

going from one - form to two - forms, 


d(/ dg) = df a dg. 


or, more generally, 


and 


d(/co) = df a co +fda>. 


d(oj t + oj 2 ) = dco^ + dco 2 . 

The pullback (ft* commutes with d in the sense that 

( ft*(df ) = d(ft*f f a function 

and 

cft*dco = d(ft*a) co a one-form. 

The notion of a two-form makes perfectly good sense in IR 3 : a typical two-form 
in R 3 (where the coordinates are x, y, z) is an expression of the form 

_ adx a dv + bdx Adz + cdy a d z, _ 

where a, b and c are functions. If 

co = Adx + Bdy + Cdz 


is a one-form, then the rules for d and for exterior multiplication give 


do) = d A a dx + dB a dy + d C a dz 


oA 
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dA . \ 


8B ; dB ; \ — 

dx + —dy + == dz a dy 
dz 


dx 


dx 


dy 
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dz _ }_ 


a dx + 


dx 




dC~. dC7\ 7 

+ — dx + —- dy + — dz a dz 


dx 


dy 


-dz- 


_/dB~8A\^ . (dC dA\. . (dC dB 


\dx dy) ^\ dx dz 


dx a dz + 


dy a dz. 


dy dz 

If (ft: R 2 -> R 3 and t is a two-form in R 3 , then (ft*x is a two-form on R 2 . If R is some 
region in R 2 and we have chosen an orientation on R 2 , then we can form the integral 
j R (ft*r which we might think of as the ‘integral of x over the oriented surface a (R)'. 






invertible, then 


ot*(fdxdy)=\ fdxdy 

JR Ja(R) 

without any conditions on orientation. 

So two-forms, t = fdx a dy, and densities like /dxdy are quite different objects - 
they transform differently under change of variables. For example, a density can be 
positive or negative (as in a density of electric charge): if /> 0 then making a change 
of variables replaces/dxdy by a*(/)|Det J|dwdy and a*/- |Det J\ is still positive (if 
DetJ#0. which will hold if a -1 is differentiable). But it makes no sense to ask 
whether a two-form t is positive or negative - since the factor DET J which enters 
in t o ( 8. 2 ) can be positi ve or negative. It is on l y when we choose an orientation (and 

nf vciriahlf* — thfYSff for which 


a 




8.8. Green’s theorem in the plane 


In considering line integrals we have encountered one generalization of the 
fundamental theorem of the calculus, namely 



r\f — f(t>\ 


u 7 — J \t>) J { A ) 

-- 


where the path y runs from A to B. This theorem relates the integral of d f over a one¬ 
dimensional region (the path y) to the values of / itself on the boundary of the path 
(the endpoints of the path). 


B 



Figure 8.32 


A similar result involving a two-dimensional region and its one-dimensional 
boundary is known as Green’s theorem. This theorem states that, for any 
differentiable one-form t and any oriented re g ion R in the plane, 
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Here the integra l on the left is the integral of the two -form dr over the reg i on R , while 
the integral on the right is the integral of the one-form t over the path dR which is the 
boundary of R. The sense in which the path dR is to be traversed is determined by the 
orientation of R. For example, if R is an annular region with a counter-clockwise 
orientation, as shown in figure 8.33, then dR consists of the outer bounding circle 
traversed counterclockwise and the inner bounding circle traversed clockwise. If R 
were given a clockwise orientation, the dR would consist of the same two circular 
paths, but each traversed in the opposite sense. 





Before proving Green’s theorem formally, it is worth reviewing the definition of 
the operator d a cting on a one - form in order to see why such a theorem ought to 
hold. Recall that we defined di as an antisymmetric bilinear function on a pair of 
vectors v, w with the property that 


_ r 

dt(P) [hx, few] - 


t + error 


JP(h,k) 

where P {h, fe) is the parallelogram spanned by vectors hx and few, and the error term 
goes to zero faster than the product hk (faster than the area of the parallelogram). If 



vectors hx and few, we have 


is a union of N parallelograms, each spanned by 


- N - N — 

v V drfp.tffev /cwl * — ^ 

«— — tf— 
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-;=i. 

P,- 


In the sum of line integrals over the parallelograms, the contributions from the 
interior segments, each of which is common to two parallelograms, cancel, since 
each segment appears once with each orientation. Thus all that remains is a single 
line integral around the boundary of R, and we have 


N 


£ ditPJCHfew] 

i— 1 


' N 

t + £ (error),-. 

JdR 1*1 


Now, as h and fe approach zero, the sum on the left side approaches the integral 
| R dr. Since N is proportional to 1 /hk, while each error term goes to zero faster 
than hk, the sum of the error terms approaches zero as h and fe approach zero, 
and we have 
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dR 


for any region which is a union of parallelograms. 
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Figure 8.35 

side x = a, traversed from y — 0 to y = b, and from the side x = 0, traversed from 
y = b to y = 0. Thus 



F* - 

-b 

-o - 


SR J 

G(a,y)dy + 

G(0,y)dy 

J 


'0 J 

b 


= \ G{a,y)dy- j G(0, y)dj;. 
Jo Jo 


But, by the fundamental theorem of calculus, 



r a dG 

G(a,y) — G(0,y) = 

— (x,y)dx. 
o dx 


We mav therefore exnress f,„r as the iterated inteera 


Jo V Jo to 

which in turn is equal to the directed double integral 

f dG 

— (x,y)dx a d y. 

JrGX 

A similar argument, applied to the one-form jF(x,y)dx, yields the result 

* fa / F'b Qp \ 

F(x,y) dx = -r-lXy) dx a d y 

J sr Jo V Jo oy / 


= — (F(x,h) — F(x,0))dx. 



ut, of course, if 


t = F{x,y)dx 


-^-(x,y)dx a dy + — (x,y)dx a djT 


so we have again proved that 


' f 

t = dr, 

J dR Jr 


which is Green’s theorem. 

We can now extend the proof of Green’s theorem to any region in the plane which 
is the image of a rectangle under a smooth transformation a. The strategy is familiar: 
we pull back the integrals J a[a(R)] i and J a(R) dr to the st-plane, in which the region of 
integration is just a rectangle: 


3[a(*)l 


(«(*) \ r 


figure 8.36 


It is clear that, if a is continuous, then the boundary of R is carried into the 


But for the rectangular region R we have already proved that 

f f 

a*r = d(a*r). 

Jsr Jr 

Furthermore, 

/» 

di = a*(di) 

J a(R) J R 

by the definition of pullback. To prove that Jac<x(R)] T = therefore, we need only 

to show that 

a*(di) = d(a*r). 


Using the rule a*(dx) = d(a*x), a*(dyj = d(a*y), we have 
a*i = (a*/)d(a*x) + (a*g)d(a*y). 



en, using the rule 


a d h, we have 


On the other hand, we know that 

dr = d/ a dx + dg a dy. 

Using th e rule a*(<7 a a>) = oc*cr a oc*co, we have 

a*dr = a*(d /) a a*(dx) + <x*(dg) a a*(dy) 

so that 

a*dr = d(a*/) A d(a*x) + d(a *g) a d(a*y). 
Comparing with (8.2) above, we see that 

d(a*r) = a*(di). 

Thus we have 


t = a*i = d(a*r) = a*(di) = dr 

Ja(5R) J SR JR JR J a(R) 

which proves Green’s theorem for a region which is the image of a rectangle. 

We have proved Green’s theorem for a region which is the image of a rectangle. 


Indeed, we can 


2 , every polygon can be decomposed into triangles. 
nv polygon into convex polygons: _ 


Figure 8.37 

Any convex polygon can be decomposed into triangles by simply choosing a point 
in the interior and joining it to all the vertices: 
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Figure 8.38 
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We thus need only to prove the theorem for triangles. Since any triangle can 
be mapped into any other by an affine transformation, the invariance of the 
integrals under smooth (in particular, affine) transformations means that it is 
enough to prove it for a single triangle. So consider the triangle T 0 = (0 ^ x < y ^ 1} 
in the xy-plane. 



Figure 8.40 
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to the limit, if we know Green’s theorem for T £ , £ > 0, it will follow for the triangle 
T 0 , since the line and area integrals around the little tip become vanishingly small. 
But T for e>0 is the i mage of the rectangle in the m>plane 

und e r th e map 


v = y, 

so long as y ^ s > 0. Hence we have reduced the theorem to a case we already 
know - the image of a rectangle. QED 







s an illustration of Green’s theorem, we consider the integral of the one-form 

t = x 2 dy 


y = 1 - x 2 


Figure 8.42 

over the closed path dR formed by the line segment from to ^ Y the parabola 


, and the line segment frc 


. The two li 


r ibute nothing to the integ r al. Pa r amete r izing the pa r abola by 
/?*x = 1 — t, p*y = 1 — (1 — t) 2 = 2t — t 2 — 0<t<l- 


we find 


T = 11 — 


T = 

5R jo 




According to Green’s theorem, the integral of dr = 2xdx a dy over the region R 
should have the same value. Evaluating this integral as an iterated integral, we 
obtain 


dy = 2(1 — x 2 )xdx = j, 


2xdx 


as expected. 

We can use Green’s theorem to obtain expressions for the area of a region in terms 
of line integrals. For example, if r = xdy, then dt = dx a dy, and 


t = dx a dy = area of R, 

dR R 




since 


dx' = d(xdy) + d(d/) = dx a dy —dr. 
Choosing /= — yxy, for example, we obtain 

x' = xdy- $xdy - jydx = i(xdy - ydx). 
On introducing polar coordinates by the formulas 

a*x = r cos 0,a *y = r sin 0 T 
we obtain, after some calculation, 


oc*x' =jr 2 d6 

which leads to the well-known formula 


A = 


rin 

\r 2 dd 


Jo 

for the area of the region bounded by a closed curve which is described in terms of 
polar coordinates. 

The basic formulas of this chapter: 
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Summary 

A Two-forms 

Given a differential one-form t on the plane, you should be able to state the 
definition of its exterior derivative dt and to calculate dr in terms of coordinates x 
and y. 

You should know how to define and evaluate the integral of a two-form over an 
oriented rectangular region of the plane. 


f Frequently in applications the a* is dropped, and one writes simply x - rcos 9, y = r si n 0 . 



B Double integrals 

You should be able to evaluate double’ 


of the plane bv carrvi 





biHatu 


verting it to a aouDie 
iven a transformation from one region of the plane to another, you should be 


i 111 
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C Green’s theorem 

You should be able to state and apply Green’s theorem in the plane. 


Exercises 

8.1. In each of the following cases, u and v are functions on a plane where x and 
y are affine coordinates. Express dx a dy in terms of du a dp. Make a 
sketch showing typical curves u = constant and v — constant in the first 
quadrant (x, y > 0) and try to give a geometric interpretation to the re¬ 
lations between dx a dy and du a dp by applying both to a parallelogram 
whose sides are tangent to u = constant and p = constant respectively. 

(a) x = u co sp. _y = usin p. 


: = U 2 — i 


8.2. Eval uate jj s x 2 y 2 dx dy, where S is the bounded portion of the first 
quadra nt lying between the hyperbolas xy = 1 and xy — 2 and the straight 
lines v = x'and y = 4x. 


(b) Show that jo(jo[jo df]du)dp = ijg(x - 

If you do this in two steps, you never actually have to consider a triple 
integral! 

8.4. Evaluate the iterated integral 


sin nx 


-dx dy 


by expressing it as a double integral over a suitable region W, then 
evaluating the integral as an iterated integral in the opposite order. Make 
a sketch to show the region W. (You may want to consult an integral 
table if you find the evaluation of the single integrals hard.) 

8.5. Consider the mapping defined by the equations 

x = u + p, y = p — u 2 . 

(a) Compute the Jacobian determinant of this mapping as a function of u 


anu p. 



(b) A triangle 7 


(0,2). Sketch its 

image S in 1 

:ne xy-piane. 















(b) Evaluate the same integral by using coordinates u and v related to x and y 
by x = 2 uv, y = u — uv. 

8.8. Evaluate 
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1 / rt/v 


u 5 v 9 du Idu 


(b) Interpret this integral as a double integral over a suitable region in the uv- 
plane. Draw a picture of this region labelling its boundary curves clearly. 
(Do not be concerned, here or later, by the fact that the region is 
unbounded.) 

(c) Reinterpret the double integral as an iterated integral in the other order, 
and evaluate this integral. 

(d) Make the substitution u = x 2 y~ 3 ,v = x~ 1 y 2 in the double integral, 
obtaining a new double integral in the xy-plane. Draw a picture of the 
domain of this new integral. 

(e) Convert the new double integral to an iterated integral and evaluate it. 

(f) Show t hat x, y are differenti able coordinates o n the whole of the (op en) 

_ first quadrant of the uv- plane. _ 

.10.(a) Evaluat e 


IT 

' f* 

1 

ydAr^ 


where Q is the first quadrant of the unit disk, by converting it to an iterated 
integral i n x and y. (‘ A ’ r e fers to the usual area in th e xy-p lane.) 

(b) Introduce polar coordinates r and 6 into the xv-plane as usual and convert 





8.11.(a) Evaluate the integral / = JV2ydx a dy as the sum of two iterated integrals 
in the xy-plane. The region W is bounded by the lines y = \x and y = 2x 
and the hyperbolas xy = 2 and xy = 8. 

/y - 2x 


y = 


xy - 8 


■xy = 2 


1 


2_ u_ 


x 



(b) h ind a rectangle R in the ut>-plane such that Wis the image of R under the 
transformation a described by «*x = 2 uv, cc*y = u/v, 

(c) Calculate ot*(2yc\x a Hv) _ 

(d) Evaluate the integral I rs an integral over the region R. 

8.12.(a) Evaluate th e lin e integral J(2 y 2 + 3x)dx + Ixydy over the curv e y shown 
in figur e 8.46, which consists of the lin e s e gm e nts 0 < x < 2 and 0 < y < 2 
and the circular arc x 2 + y 2 = 4 for x ^ 0, y ^ 0. 

(b) Construct a double integral over the region bounded by y which must be 
equal to the line integral in (a). Evaluate this double integral by 
transforming to polar coordinates. 


4 y 



(c) Find a function f(x) with the property that 



f* - 

r 1,2 i T-yArK- _L_ O_ 
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when the integral is evaluated around any closed curve in the plane. 

8 . 13 . One way to change coordinates in a directed double integral / = 
\ w fdx a dy, where W = (f)(S ), is to use Green’s theorem to express / as a 
line integral over the closed path 8W, transform the result to a line integral 
in the uv- plane, then use Green’s theorem again to express / as a double 
integral in the uy-plane. Use this approach to derive the change of 
variables formula for double integrals. 

8 . 14 . Let u and v be functions on the plane whose first and second partial 
derivatives with respect to x and y are continuous. Let S be a connected 
region in the plane with boundary dS. Show that 
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Chapter 9 presents an example of how the results of the first 
eight chapters can be applied to a physical theory - optics. It 
is all in the nature of applications, and can be omitted without 
any effect on the understanding of what follows. 


9.1. Theories of optics 


In the history of physics it is often the case that, when an older theory is superseded 
_by_a newer one, the olde r the ory re tains its validity, either as an approxi mat ion 
to the newer theory, an approximation that is valid for an interesting range of 
circumstanc e s, or as a special case of the n e wer theo r y. Thus Newtonian mechanics 
can be regarded as an approximation to relativistic mechanics, valid when the 

v el ocit ies that ar i se a r e v er y small in c o mparison to the velocity o f light . Simil arly ,_ 

Newtonian mechanics can be regarded as an approximation to quantum mechanics, 
valid when the bodies in question are sufficiently large. Kepler’s laws of planetary 
motion are a special case of Newton’s laws, valid for the inverse square law of force 
between two bodies. Kepler’s laws can also be regarded as an approximation to the 
laws of motion derived from Newtonian mechanics when we ignore the effects of the 
planets on each other’s motion. 

The currently held theory of light is known as quantum electrodynamics. It 
describes very successfully and very accurately the interaction of light with charged 
particles, explaining both the discrete character of light, as evinced in the photo¬ 
electric effect, and the wave-like character of electromagnetic radiation. The triumph 
of nineteenth century physics was Maxwell’s electromagnetic theory, which was a 
self-contained theory explaining electricity, magnetism and electromagnetic radi¬ 
ation. Maxwell’s theory can be regarded as an approximation to quantum 
electrodynamics, valid in that range where it is safe to ignore quantum effects. 
Maxwell’s theory fails to explain a whole range of phenomena that occur at the 
atomic or subatomic level. 





One of Maxwell’s remarkable discoveries was that visible light is a form of 
electromagnetic radiation, as is radiant hpg t . In f act, sinc^ Maxwell, optics is a special 
chapter of the theory of electricity and magnetism which treats electromagnetic 


vibrations of all wavelengths, from the shortest y rays of radioactive substances 
(having a wavelength of one hundred-millionth of a millimeter) through the X-rays. 

visible light, the infra-red, to the longest, r adio waves (having a 


wavelength of many kilometers). In th e flood of invisible light that is acc e ssibl e to the 
mental eye of the physicist, the physiological eye is almost blind, so small is the 
interval of vibrations that it converts into sensations. 

Maxwell’s theory dealt with the source of electromagnetic radiation as well as its 
propagation. Before Maxwell, there was a fairly well-developed wave theory of light, 
due mainly to Fresnel, which dealt rather successfully with the propagation of light 
in various media, but had nothing to say about the production of light. Fresnel’s 
theory did account for three physical effects which could not be explained by earlier 
theories - diffraction, interference, and polarization. Diffraction has to do with the 
behavior of light in the immediate vicinity of surfaces through which it is transmitted 
or reflected. A typical diffraction effect is the fact that we cannot produce an 
absolutely straight, arbitrarily narrow beam of light. For example, we might try to 
pro du ce suc h a b ea m by li n in g up two o pa que screens w it h holes in them, to 


collimate light arriving from the left of one of them. When the holes 
the order of the wavelength of the light), we find that the region to the right of the 
second screen is suffused with l i ght, instead of there be i ng a narrow beam. 




/ 




/ 
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Figure 9.1 


‘Interference’ refers to those phenomena where the wave character of light manifests 
itself by the constructive or destructive superposition of light travelling different 
paths. Typical is the famous Young interference experiment illustrated in figure 9.2. 
‘Polarization’ refers to the fact that when light passes through certain materials, it 
appears to acquire a preferred direction in the plane perpendicular to the ray; such 
effects can be observed, for example, by using Polaroid filters. 

Geometrical optics is the approximation to wave optics in which the wave 
character of light is ignored. It is valid whenever the dimensions of the various 
ap e rtures are very large when compared to the wavelength of the light, and wh e n we 












do not examine too closely what is happening in the neighborhood of shadows or 
foci. It does not account for diffraction, interference or polarization. 

Linear optics is an approximation to geometrical optics that is valid when the 
various angles which enter into consideration are small. In linear optics one makes 
the approximation sin 6 = 6, tan 6 = 6, cos 6 = 1, etc.; i.e., all expressions which are 
quadratic (or of higher order) in the angles are ignored. For example, in geometrical 
optics, Snell’s law says that if light passes from a region whose index of refraction 
(relative to vacuum) is n, into a region whose index of refraction is n', then n sin i = 
n'sinz' where i and i' are the angles that the light ray makes with the normal to the 
surface separating the region s. In li near optics we repl ace th is law by the simpler law 



ni = n'i', which is a good approximation if i and i! are small. (This approximate law 
was known to Ptolemy.) The deviations between geometrical optics and the linear 
optics approximation are known as (geometrical) aberrations. For instance, if a 
bundle of parallel rays is incident on a spherical mirror, a careful examination of the 
reflected rays shows that they do not all intersect at a common point. The rays near 
the diameter do intersect near a common focal point. In linear optics we restrict 







Figure 9.4 Spherical aberration. 
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ourselves to rays close enough to the diameter so that we may assume that there is a 
common focu s. (This deviation from focussing for a spherical mirror is a case of 
spherical aberration.) 

Gaussian optics is a special case of linear optics in which it is assumed that all 
the surfaces that enter are rotationally symmetric about a central axis. This is a very 
important special case since all ground lenses a nd most polished mirrors have this 
property. We can summa r ize our discussion in figure 9.5. 


9.2. Matrix methods 


In Gaussian optics we are interested in tracing the trajectory of a light ray as it 
passes through the various refracting surfaces of the optical system (or is reflected by 
reflecting surfaces). We introduce a coordinate system so that the z-axis (pointing 
from left to right in our diagram) coincides with the optical axis (i.e., the axis of 
symmetry of our system). We shall restrict attention to coaxial rays - those that 
lie in a plane with the optical axis.* 

By rotational symmetry, it is clearly sufficient to restrict attention to rays lying in 
one fixed plane. The trajectory of a ray, as it passes through the various refracting 
surfaces of the system, will consist of a series of straight lines. Our problem is to relate 
the straight line of the ray after it emerges from the sys tem to the entering straight 
line. For this we need to have a way of specifying straight lines. We do so as follows: 


we ch o ose s o me fixed z va lue . T his am ou n ts to 



the 


optical axis, caUedthereferenee plane. Thena 
numbers, its height, q above the axis at z, and the angled that the line makes with the 
optical axis. T he angle 6 will be measured in r adians and considered positiv e if a 
counterclockwise rotation carries the positive z-direction into the direction of the 
ray along the straight line. It is convenient to choose new reference planes, suitably 



Figure 9.6 

adjusted to each stage in the calculation. Thus, for example, if light enters our optical 
system from the left and emerges from the right, we would choose one reference 
plane z x to the left of the system of lenses and a second reference plane z 2 to the right. 

* Although this is introduced here as a simplifying assumption, it can be proved that linearity 
implies that the study of the most general ray can be reduced to the study of coaxial rays by 
projection onto two perpendicular components. 








A ray enters the system as a straight line specified by q x and 6 l atz 1 and emerges as a 
straight line specified by q 2 and 0 2 at z 2 . Our problem, for any system of lenses, is to 
find the relation between ( q 2 ,d 2 ) and ( q x ,9^. 

Now comes a simple but crucial step, of far reaching significance, which is basic to 
the geometry of optics and of mechanics. 

Replace the variable d by p = n6 where n is the index of refraction of the medium at 
the reference plane. (In mechanics, the corresponding step is to replace velocity by 
momentum.) 


We thus describe a light ray by the vector I and our problem is to find 


/ q\ 
W 


q.2 

,P2. 


as 


a 




all terms quadratic or higher, it follows 



(q 2 \ 


from our approximation that 

is a linear function of 

^ , i.e., that 


KP 2 J 

\PiJ 




Hftiai 


\PiJ 

VPt/ 


for some matrix M 21 . The key effect of our choice of p instead of 6 as variable is the 
assertion that 

~DetM 21 = 1. 

in other words, that the stud y of Gaus sian optics is equivalent to the stu dy of the 
group of 2 x 2 real matrices of determinant one, the group SI(2,IR). To prove 
this, observe that if we have three reference planes, z l ,z 2 , and z 3 , situated so that the 
light ray going from z x to z 3 passes through z 2 , then by definition 

A/31 — M 32-A/21 • J 

|_______J 

Thus, if our optical system is built out of two components, we need only verify 

DetM = 1 for each component separately. To simplify the exposition, assume that 

our system does not contain mirrors. 


The basic components 

Any refracting lens system can be considered as the composite of several 








(a) A translation, in which the ray continues to travel in a straight line between 
two reference planes ly i ng i n the s ame medium. To describe such a system we must 
specify the gap, t, b e tween the planes and the refractive index, n, of the medium. It is 
clear for such a system that 6 and hence p do not change and that q 2 = q x + (tfnjpY- 
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Figure 9.8 


We write T= t/n (called the reduced distance) and see that 

(q 2 \ (1 T\fq t \ _ (\ r 


,p 2 j \o iA Pi)’ Det l° 1 / L 

(b) R efracti on at the boundar y surface betwee n two regionsof differin g refractive 
index. We must specify the curvature of the surface and the two indices of refraction, 
n t and n 2 . Th e two re fe r ence planes will be taken immediately to the left and 
immediately to the Tight of the surface. 


At such a surface 


l, the q value does not change. The.angle, and hence 
the p, changes according to (the linearized version of) Snell’s law. Now Snell’s law 
involves the slope of the tangent to the surface at the point of refraction. In our 


a ppr oximation , w e are ignoring q uadr at ic te rms in this slope, 
"thre e or h igher i n th e su rface. We may thu s 
intersection of this surface with our plane is a parabola 



z-z { = \kq : 


Then the derivative of z with respect to q is z'(q) = kq, which is tan (n/2 — i {/) where 
ifj is the angle in figure 9.9. For small angles 9, i.e., for small values of q, if/ will be close 
to njl and hence we may replace tan (n/2 — if/) by n/2 — if/, if we are willing to drop 
higher order terms in q or p. Thus n/2 — if/ = kq is our Gaussian approximation. On 
the other hand, if (n/2 — i t ) denotes the angle that the incident ray makes with this 




tangent line, then the fact that the sum of the interior angles of a triangle add up to n 
shows that in — if/) + G x + (it/2 — 4 ) = n or 

h =0 1 + kg 

and similarly 

h = 0 2 + kq 

where q~ q x = q 2 '^ the point where the rays hit the refracting surface. Multiplying 
the first equation by n x and the second equation by n 2 , and using Snell’s law in the 
approximate form n 1 i 1 = n 2 i 2 , give 

q A =( 1 °Y* 

P2) \-p w\Pi 

where P = (n 2 — njk is called the power of the refracting surface. 


Conjugate planes 

Thus each Gaussian optical system between two reference planes corresponds to a 
matrix 


M = | 

(A B\ 

1 r' _ rv 1 

| with AD — BC = 1 


u J 


and one can set up a dictionary which transl a tes properties of the matrix into optical 
properties. 

For instance, the two planes are cal led co njugate (or in focus with one another) for 
any q x at z 1 , if all the light rays leaving q x converge to the same point q 2 at z 2 . This of 
course means that q 2 should not depen d on p x , i .e., that 

B = 0 . 


i he thiir lens 


( 1 0 ) 

l *1 1 ' 

N L‘ ll 111 d L f L L ' f ll f 

t ~p U 

| again has thrs same 


1 °Y 1 ° N \ = / / 1 0 

-Pi ^\ p2 \j~\-(p,+p 2 ) 1 


This gives the equation for the so-called thin lens consisting of refracting surfaces 
with negligible separation between them. In this case, the reference planes z x and z 2 



can conveniently both be taken to coincide with the plane of the lens. The plane z x 
relates, of course to rays inc id ent frr>m the left, while z 2 relates to rays which emerge 
from the lens and continue to the right. 

The matrix for the left refracting surface is 


l 

l 1 H 


1 

i_7__ i , 



The matrix for the right refracting surface is 



(Note that R 2 is negative in figure 9.10.) Multiplying these matrices, we find that the 
matrix for the thin lens is 


1 

-1/7 



where 


\/f=(n 2 -n 1 )(\/R 1 -l/R 2 ). 


We shall assume that the lens is in a vacuum, so n x = 1 and n 2 > 1. In the case where 
R 1 is positive, R 2 is negative, and n 2 — n l > 0 (a double-convex lens), the focal length 
f is posit iv e. If we calculate the matrix of the thin lens betwee n a reference plane F , 
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located a distance/to the left of the lens and a reference plane F 2 located a distance / 
to the right, we find 


f /y 1 °Y' /w 0 f 

l,o iAo i/ \-uf o 

The plane F t is called the first focal plane. If a ray, incident on the lens, passes 
through this plane at = 0 with slope p 1? then the outgoing ray has 


( q 2\J 0 f\(0\(fPl 

W V-1 If oA Pi) \ o 

i.e., it has zero slope and so is parallel to the axis. Conversely, if the incident ray has 
zero slope, the outgoing ray has 



(qf\ , 

( 0 f\ 

( qf\ 


f 0 \ 


T£ | 

3 1 

i 1 if r\ f 

“ 1 1 
A J 
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/ r ’ 

\P2j 

\-Vj u/ 

v 0 y 


\ *lxlJ J 



i.e., it crosses the axis in the second focal plane. More generally, we can see that p 2 is 
independent of q u so that incident rays pass ing through a given point in the first 
focal plane emerge as parallel rays, all with the same slope. Furthermore, q 2 is 
independent of , so that incident rays all emerge to pass through the same position 
in the second focal plane. 

As a simple illustration of th e u se of matrix methods to locate an ima ge, suppose 
that w c tak e r e f e renc e plane z t to lie a distance s 1 to th e left of a thin lens, while z 2 lies 
a distance s 2 to the right of the lens. Between these planes, the matrix is 

(\ s 2 \f 1 0\/l si\/l -s 2 /f s 2 + s 1 -s 1 s 2 /f\ 

\0 1 A-l If lAo 1/ V -V/ 1 sjf / 

The planes are conjugate if the upper right entry of this matrix is zero. Thus we 
obtain 1/5 X + l/s 2 = 1//, the well-known thin lens equation. We shall write this as 

S 1 + s 2 — Ps x S 2 = 0 , 


where P — 1 //. 

We can solve this equation for s 2 so long as Sj ^ 1 /P. Thus each plane other 
than the one corresponding to Sj =/ has a unique conjugate plane. For Sj =/, i.e., 
at the first focal plane, all light rays entering from a single point q emerge parallel, so 
the conj ugate plane to the fi rst fo c al plan e is ‘at infinity’. A similar discussion (with 
right and left intercTianged) applies to the second focal planed 


For S[ #/and 5 2 corresponding to the conjugate plane, 


-b r 


— j _ t <?2 


HT 


r 



T f .S) and s 2 a re both positiv e (obj ect to left of lens, image to right) , then th e 
magnification is negative, which means that the image is inverted. 

By multiply ing matrices, it is straightforward to construct the m atrix for any 


and / 2 , separated by distance l in air, we find the matrix 

( 1 o\/i I V 1 ow '- 1 /fi I ' 

V-1//2 1A0 1A-1//1 \) W/1/2-1//2-1//1 1-///2, 

between the reference plane z x (first lens) and z 2 (second lens). 







The telescope 

A particularly int er es ti ng s itu ation arises when / = /j + f ? , for then the matrix 
takes the form 



(A B\ 



1 ’ 


i.e .. C = 0 . This means that p 2 = Dp : . i.e. that the outgoing directions depen d only on 
the incoming directions. The condition is satisfied in the astronomical telescope , 
which consists of an objective lens of large positive focal length f x and an eyepiece of 
small positive focal length f 2 , separated by a distance f l +/ 2 . Such a telescope 
converts parallel rays from a distant star into parallel rays which are presented to the 
eye. 



The angular 


is the r atio of the slop e of th e 


outgoing rays t o the slope of t he incoming rays, which equals^ 


D 


~t fi +f 2 _ f i 

I2 _ £2 _ ll 


This magnification is negative (the image is inverted) and its magnitude is the ratio 
of the focal length of the objective to that of the eyepiece. 


The general system 

We now want to show that any 2x2 matrix with determinant 1 can arise as the 
matrix of some optical system. First of all, suppose that the matrix is telescopic, 


t 1 0 \(A B\_( A 

i.e., C = 0. Then A ¥= 0, and if P #0, then \_ p | H q qJ \ - PA 

■ ( A 

has PA # 0, so is not telescopic. We shall show that every matrix \ 


B 

D-PB 
B 


D 


with 


C # 0 can be written as 


C 






m 1 \ 



and thus arises as an optical matrix. If C = 0, then we need only multiply 


( A B \ 


(\ 0\ 

/ A B\ 

V - PA D -PBJ 

rwi 


| on the left to get „ 

VO L> J 

, so it too is an optical 


matrix. To prove (9.1), consider 


/I s\fA B\/l 

_ 

(A+sC t(A + sC) + B | s + D\ 



V C Ct + D ) 


Since C # 0, we can choose s so that A + sC = 1 and then choose t — — (Bs + D). 
The resulting matrix has 1 in the upper left-hand corner and zero in the upper 
right-hand corner. This implies that the lower right-hand corner is also 1 so that 
the matrix on the right has the form 

1 0 
C 1 

and this proves our assertion. 


Gauss decomposition 

Notice that s and t were uniquely determined. Thus, for any non-telescopic optical 
system, there are two unique planes such that the matrix between them has the 
I O' 


form 


C 1 


. Th e se plan e s ar e conjugate to one 


cation one. Gauss called them the principal planes. If 


nth the optical 


1 0 . 

I matrix I ) between the two principal planes, we can p r oceed exactly as fo r 

-W-Jr- 


the thin lens, to find the conjugate plane to any plane. All we have to do is write 
C— — P = — 1//. F o r instan ce, t he two focal planes are locat ed J units to the 

planes :_ 


i 

f 1 A 

1 1 Ai 


f o f) 
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V-i If i/ 1 

10 1 j 1 

l -1// frj 



Gauss gave the following interpretation in terms of ray tracing of the decomposition 

'<l\ 


we derived above for the more general non-telescopic system. Suppose a ray 


0 


parallel to the axis, enters the system at z x . When it reaches the second principal 
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e same height but is ben 



local poin , ^ _ _ _ 

point is bent at the first principal plane into a ray parallel to the axis and arrives 


at z, still para 


t the same height above the a 


We see that the most general optical system which is not telescopic can be 
expressed simply in terms of three parameters - the location of the two principal 
planes and the focal length. (We know that there should be three parameters, since 
there are only three free parameters in the matrix, the fourth matrix coefficient 
being determined by the fact that the determinant must equal 1.) 

Once we have located the principal planes, we have also located the focal planes 
by 

H 1 -F 1 =f and F 2 -H 2 =f. 

If we use the two focal planes as the reference planes for our system, then, by the 
very definition of focal planes, we know that the optical matrix for these two planes 
must have zeros in the upper left-hand corner and in the lower right-hand corner. 
Thus the matrix between the two focal planes is given by 


-!// o 


Suppose that we now consider two ■ 
"by 




to the focal planes 


71 = n i*i and y 2 - F 2 = n 2 x 2 . 
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The matrix between these two planes will be 

/I x 2 \/ 0 A/l x l \ = /-x 2 /f /-(x 1 x 2 //)'\ 

\o iA-v/ o)\o \) \~i if -xjf y 

We see that y x and y 2 are conjugate if and only if x l x 2 =f 2 (this is known as 
Newton’s equation), in which case the magnification is given by 

"* = ~x 2 /f= -f/x v 
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We can summarize the results of this section as follows: Let Si (2, RJ denote tfaT 
roup of all 2 x 2 matrices of determinant 1 We have shown that there i 





matrices correspon 




corresponding system. 


9.3. Hamilton’s method in Gaussian optics 

Suppose that z x and z 2 are planes in an optical system which are not conjugate. 
This means that the B term in the optical matrix is not zero. Thus, from the equations 

q 2 = Aq x + Bp 1 
Pi = C<h + £>Pi 

we can solve for p x and p 2 in terms of q x and q 2 as 

Pi = (1/B)(q 2 - Aqj) 

and 

Pi = (!/£)(£<?2 “ <h) 


significance: given a point q x on the z x -plane and a point q 2 on the z 2 -plane, there 
exists a unique li ght ray joining these two points. (This is exactly what fails to 
happen if the planes are conjugate. For conjugate planes, if q 2 is the imag e of q l} 
there will be an infinity of light rays joining q x and q 2 ; in fact, all light rays leaving 
q± arri ve at g 2 . If g 2 is not the i ma ge of q lt the n t he re w il l be n o li ght ray joinin g 
q x and q 2 .) Let W ~ W(q u q 2 ) be th e function 


where l^lsa constant. Then we can write the equations for p x and p 2 as 


Hamilton called this function the point characteristic of the system. In the modern 
physics literature this function is sometimes called the eikonal. Suppose that z l5 z 2 
and z 3 are planes such that no two of them are conjugate, with z y < z 2 < z 3 , and 
such that z 2 does not coincide with a refracting surface. Let W 21 be the point 
characteristic for the z x -z 2 system and let W 32 be the point characteristic for the 
z 2 -z 3 system. We claim that (up to an irrelevant additive constant) the point 
characteristic for the z x -z 3 system is given by 

W 31 (q lt q 3 )=W 21 (q 1 ,q 2 )+W 32 (q 2 ,q 3 ) 

where, in this equation, q 2 = q 2 (q u q 3 ) is taken to be the point where the ray from 
q i to q 3 hits the z 2 -plane. 


2 - 



Now apply the chain rule to conclude that dW 3 Jdq 1 = -p 1 and similarly that 
dW 31 /dq 3 = p 3 at (q lt q 3 ). 

The function W is dete r mined by the above properties only up to an additive 
constant. Hamilton showed that, by an appropriate choice of the constant, we can 
arrange that W(q t ,q^) is the optical length of the light ray joining q] to q 2 where 
the optical length is defined as follows. For a line segment of l e ngth / in a medium 
of constant index of refraction, n , the optical length is nl. A path, y, is defined to 
be a broken line segment, where each component segment lies in a medium of 
constant index of refraction. If the component segments have length / ; and lie in 
media of refractive index n t , then the optical length of y is 

L(y) = 'L n i l i- 


Let us prove Hamilton’s result within the framework of our Gaussian optics 
approximation. Our approximation is such that terms in p and q of degree 
higher than one are dropped from the derivatives of W. Thus, in computing optical 
length and W, we must retain terms up to degree two but may ignore terms higher 
than the second. We will prove this by establishing the following general formula 
for the optical length (in the Gaussian approximation) of a light ray y whose 



^(y) = ^axis+l(P2g 2 -Plgl) 


where L axis denotes the opti ca l le ngth from z, to z 2 of the axis( p, =q i = 0 = p 9 = q 0 ) 
of the system. Notice that once this is proved , the n, if w e ass ume th at and z 2 

of q x and q 2 , i.e.. 


substituti ng p t = (\[E){q 2 — Aq l ),p 2 = (l/f?)(Z>g 2 — ff i) int o the above formula gives 
our expression for W with K — L axis . 

To prove the above formula for L(y), we observe that it behaves correctly when 


we combine systems: if we have z x , z 2 and z 3 , then the length along the axis certainly 

adds, and i(p 2 4 2 - Pi4i) + i(P 3 ^3 - P 2 <? 2 ) = i(P3<7 3 ” PiQi)- So we need onl y 
prove the formula for our two fundamental cases. 

(1) If n is constant, 


L(y) = n(d 2 + (q 2 -q 1 ) 2 ) 112 

. . 1 n , ,2 
= n d + ^( q 2 - q ir 


= nd + 


1 


n 


(Qi-qi) 


(q 2 -Qi) 


= nd + \p{q 2 - qi ) 


where p 2 = p 1 = p = (n/d)(q 2 — q x ) is the formula which holds for this case. 

(2) At a refracting surface, z' — z = ykq 2 with index of refraction 7^ to the left 
and « 2 th e right- Here the computation must be understood in the following 






sense. Suppose we choose some point z 3 to the left and some point z 4 to the right 
of our refracting surface. If n 1 were equal to n 2 , the optical length would be 
nj 3 + n 2 (l + / 4 ) where / 3 is the portion of the ray to the left of our plane and / + / 4 
is the portion to the right, and where / 4 is the portion to the right of the surface. 
(We have drawn the figure with k > 0, but a similar argument works for k < 0.) If 
n 2 # n u then n 2 l 4 will be different, but would be calculated by (1) from z to z 4 . In 

refracting surface is to replace n 2 l by nj in the above 


expression, i. e., to modify the optical length by 

(«i -n 2 )l. 


This is the contribution at the ref racting surfa ce. Now 

/ = (z" — z) cosec 0, 


where z' is 




q" = (tan fl^z" - z) + q. 


we may take z" — z' — \kq 2 + z and 


It is clear th a t up to terms of hij 
replace cosec 6 t by 1 so 

(«i - n 2 )l = jk(n 1 - n 2 )q 2 = - \pq 2 
= ?W”i ~n 2 )q]q 

since q 2 — = q and p 2 = p l — pq at a refracting surface, where p = k(n 2 — n x ). 

This completes the proof of our formula. 


9.4. Fermat’s principle 

Let us consider a refracting surface with power p = (n 1 — n 2 )k located at z. Here 
P might be zero. Consider planes z l to the left and z 2 to the right of z. We assume 




Fermat’s principle 


327 


con s t a nt index of refraction b e tw ee n z t and z and between z and z 2 . Let q x b e a 
point on the z l -plane, q 2 a point on the z 2 - plane and q a point on the z-plane7 
Consider the path consisting of three pieces: the light ray joining q x to q, across 



Zl Z Z 2 
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the surface of refraction at q and then the light ray joining q to q 2 . This path will not, 
in general, be an optical path, sin ce q c an be arbitrary. However, its 


tiili 4- nl 4- n 2 l 2 


rin the 
last section: - 


approximation, by the sum of three terms, as we saw in the 


4i< 7 2 ) = ^axi S + ¥,Pi(q -<h) + Pi{q 2 - g) - pf)- 


In this expression, 



so we can write 


r t 1 di 9 d -2 ■? ? 

L axis + Z — Pi + — Pi ~ PQ • 

2 n 2 

Suppose that we hold q 1 and q 2 fixed, and look for that value of q which 
extremizes L: in other words, we wish to solve the equation dL/dq = 0 for fixed 
values of q l and q 2 . Substituting into the last expression for L, together with facts 
that dp x /dq = (n l /d 1 )(dq 2 /dq) = — n 2 /d 2 , we obtain the equation 

Pi-p 2 -pq = o. 

In other words: 

p 2 = pq + P i- 

But this is precisely the relation between p x and p 2 given by the refraction matrix 
at z. 


We have thus proved the following fact. Let us fix q x and q 2 and consider the 
set of paths joining q 1 to q 2 which consists of two segments, from q' to q 2 . Among 
all such paths, the actual light ray can be characterized as that path for which L 
takes on an extreme value, i.e., for which 

dL 


dq' 


= 0 . 


This is (our Gaussian approximation to) the famous Fermat principle of least time. 
Let us substitute = (njd^iq — q x ) and p 2 = (n 2 /d 2 ){q 2 — q) into our formula for 
L to obtain a third expression for L: 

L= n 1 d 1 + n 2 d 2 + H(«iA*i)(4 - qj 2 + ( n 2 /d 2 )(q 2 - q) 2 - pq 2 \ 

The coefficient of q 2 is n l /d 1 + n 2 /d 2 — P. Thus the extremum is a minimum if 

(«i/di) + (n 2 /d 2 ) — P> 0 

and a maximum if 

n l /d 1 + n 2 /d 2 — P < 0. 

If P > 0 we see that we get a minimum for small values of d x and d 2 but a maximum 
for large values of d l and d 2 . The situation is indeterminate (and we cannot, in 


general, solve for q') when 


njd i +n 2 /d 2 = P 


which is precisely the condition that the planes be conjugate. Thus, we get a 

maximu m otherwise. The fact that L is minimiz ed only up to the first conju gate 
point is true in a more general setting, where it is known as the Morse index theorem. _ 


bein g reflect ed fro m a concave spherical mirror. W e tak e a poi n t Q inside the 
sphere and let the light shine along a diameter so that it bounces back to Q. Then 


i t is cl e a r that the distance to the mirror is a loc al min i mum if Q is clos e r than 
the center, and a maximum otherwise. 


9.5. From Gaussian optics to linear optics 

What happens if we drop the assumption of rotational symmetry but retain the 
approximation that all terms higher than the first order in the angles and distances 
to one can be ignored? First of all, in specifying a ray, we now need four variables: 
q x and q y , which specify where the ray intersects a plane transverse to the z-axis, 
and two angles, 6 X and Q y , which specify the direction of the ray. A direction in 
three-dimensional space is specified by a unit vector, v = (f*, v y , v z ). If v is close to 
pointing in the positive z-direction, it will have the form v = (0 X , 6 y , v z ), where 
v r = 1 — Md 2 + Q 2 ) = 1 , provided 6 X and 9.,, are small. Again, we replace t h e 9 
variables by p variables, where p x = n6 x and p y = n6 y . (If the medium is anisotropic, 
as is the case in certain kinds of crystals, the relation between the 0 variables and 





the p variables can be more complicated, but we will not concern ourselves with 
that here.) All of this, of co ur se , is ta ki ng place at some fixed plane. Tf we consider 
two planes z 1 and z 2 , the ray will correspond to vectors 
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at the respective planes. 


Our problem is to find the form of the relationship between u 1 and u 2 . Since 
we are ignoring all higher-order terms, we know that 

u 2 = Muj, 

where M is some 4x4 matrix. Our problem is to ascertain what kind of 4 x 4 
matrices can actually arise in linear optics. The most obvious guess is that M must 



satisfy the requir eme nt Pet M = 1. This is not th e rig h t ans wer, howev er . It is true 


matrices 


4x4 matrices of determinanTl can actually arise as transformation matrices in 
linear optics. There is a stronger condition that must be imposed. In order to 


expla imw yhat th is^ stronge r^ 


first go back and reformulate the 


condition that a 2 x 2 matrix has determinant 1. We then formulate this condition 
in four variables. Let 


w = 


and w' = 



be two vectors in the plane. We defined in section 4.9 an antisymmetric ‘product’, 
<u(w,w'), between these two vectors by the formula 


co(w, w') = qp' — q'p. 

The geometric meaning of w(w,w') is that it represents the oriented area of the 
parallelogram spanned by the vectors w and w' (see figure 9.18). It is clear from 
both the definition and the geometry that co is antisymmetric: 

to(w, w') = — co(w', w). . 

A 2 x 2 matrix preserves area and orientation if and only if its det e rminant 





— Q(v, u) for all u and v in V. We say that Q. is nondegenerate if the linear function 
Q(u, •) is not identically zero unless u itself is zero. An antisymmetric, nondegenerate 
bilinear form on V is called a symplectic form. A vector space possessing a given 
symplectic form is called a symplectic vector space, or is said to have a symplectic 
structure. If V is a symplectic vector space with symplectic form Q, and if A is a 
linear transformation of V into itself, we say that A is a symplectic transformation 
if Q(Au, Av) = Q(u, v) for all u and v in V. It is a theorem (cf Guillemin & Sternberg, 
Symplectic Techniques in Physics Chapter II) that every symplectic vector space 
must be even-dimensional and that every symplectic linear transformation must 
have determinant 1 and, hence, be invertible. It is clear that the inverse of any 
symplectic transformation must be symplectic and that the product of any two 
symplectic transformations must be symplectic. The collection of all symplectic 




ow let us assume that V= R" + R" and write the typical vector in V as 
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, where q = 

; | and p = ; 
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where • denotes ordinary scalar product in R”. In terms of the scalar product 
u-u' = q*q' -f p-p' we can write this as 

Q(u, u') = u'Ju, 

( 0 A 

where J is the In x In matrix I ^ 1 and / is the n x n identity matrix. A linear 

transformation T on V is symplectic if, for all u and u', 

Q(Tu, Tu') = Q(u,u). 

We can write this as 

T t JTumi' = Juu', 

where T t denotes the transpose of T relative to the scalar product on V Sinpp 
this is to ho l d f o r all u and u' we must have 

T t JT = J. 

We can write 


t t \ 



( Aq + Bp\ 
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where A, B, C, and D are n x n matrices; that is, 


T = 

(A B\ 



KC D) 



T t = 


A t C t 


\b t 

where A T denotes the n-dimensional transpose of A, etc. The condition T t JT = J 
becomes the conditions A T C = C T A, B J D = D J B, and A r D — C J B = I. Notice that 
T -1 , which is also symplectic, is given by 

r->J l>T - BT ) 

l-C T A T ’ 


and so we also have 

DC T = CD t and BA J = AB J . 

We now turn to the problem of justifying the assertion that the group of linear 



(I 0\ , . . . 

approximation) tile matrix j 

[ p 1 ), where P = P T is a symmetric matrix, 


corresponds to refraction at a surface between two regions of constant index of 



rrsy 


retraction (ana tnat every r can arise) and that 


| conesponds to motioti 


in a medium of constant index of refraction, where d is the optical distance along 
the axis. The second is a mathematical argument showing that every symplectic 
matrix can be written as a product of matrices of the above types. 


We will omit the mathematical part, which is a rather tricky generalization of 
the arguments of section 9.2. We refer the reader to Guillemin & Sternberg 
Symplectic Techniques in Physics section 4, pp. 27-30. We concentrate on the 
physical aspects of the problem. As in Gaussian optics, we describe the incoming 
light ray by its direction v = ( v x , v y , v z ) and its intersection with the plane parallel 
to the xy-plane passing through the point z on the optical axis. Here ||v|| 2 = 
v x + v y + v z = 1- Now 

v z = (1 - vl + V J) 1/2 = 1 + V*) + ••• = 1, 

since we are ignoring quadratic terms in v x and v y , which are assumed small. We 
set 

P x = nv x , p y - nv y , 


where n is the jndex of refractio n. Moving a distance t along the optical axis is 
the same (up to quadratic terms in v x and v y j as moving a distance t along the— 
line through v and hence 
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Figure 9.19 







The basic formula for the optical length 
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There is no point in repeating the proof. 

Two planes are called nonconjugate if, in the optical matrix relating them, the 

matrix B is nonsingular. Then we can solve the equations 

<h = Mi+ B Pi 

and 

P 2 = C( h + D Pi 

for p! and p 2 as 

p x = - B~ l Aq ! +£ -1 q 2 

and 

p 2 = (C — DB~ 1 A)q l + D5 _1 q 2 . 

We can then write 

L=L axis + W{q 1 ,q 2 ), 


W{q 1 ,q 2 )=i[DB 'q^ + B ^vfqTqi - (25^ 1 q 1 *q 2 ]- 


(In proving this formula we make use of the identity 

4 «T\-1. r' nn~i a 

— {tf j — V.— IJd A, 

which follows for nonsingular B from A T I) — B T C = /.) A direct computation (using 

^he aboveddentity) shows that (in the obvious ^ense) 

dL 

1 - P 2 _ 

^2 

and 

dL 

. - Pi 

dqi 

Thus a knowledge of L allows us to determine p x and p 2 in terms of q x and q 2 . 

We can now briefly describe the transition to (nonlinear) geometrical optics. 
We can put the condition that the matrix A be symplectic in the following way. 
Consider the two-form 

co = dq x a d p x + dq y a dp y 

on [R 4 . Then the linear map A: IR 4 -» 1R 4 is symplectic if and only if 

A* co = co, 

in other words, the pullback of co under A is again co. We can now call a differentiable 

map cf) symplectic if 


(poo = CO. 






We simply drop the condition that d> be linear. (In the older literature, symplectic 

m a nc ivuw 11 _ • _i ^_ c _^ T-Tamil tnn chnu/prl thnt fh a mane tfrnm 


maps were called canonical t ra n sf or mations.) Hamilton showe d that the maps (from 
incoming to outgoing maps) in geometrical optics are precisely the symplectic maps. 


He also showed that under approximate non-cutiyrueticy hypotheses, a symplectic 


map is determined by the characteristic function L as above, where L(q, ,q 2 ) is the 
optic al le ngth of the path joining q t to q 2 - (O f course, L no longer has the simple 
formula given above.) 

Some ten years after writing his fundamental papers on optics, Hamilton made 
a startling observation: that the same formalism applies to mechanics of point 
particles. Let qi,...,q n represent the (generalized) position coordinates of a system of 
particles and ,..., p„ the corresponding momenta. Replace the optical axis, z, by 
the time. Then the transformation from initial position and momenta to final 
position and momenta is always symplectic. This discovery led to remarkable 
progress in theoretical mechanics in the nineteenth century. In the 1920s - almost 
a century later - Hamilton’s analogy between optics and mechanics served as one 
of the major clues in the discovery of quantum mechanics. 


Summary 


A Matrix formu lation o f Gaussian optics_ 

You should understand the use of a two-component vector to represent a ray 
passing through a reference plane. 

You should be able to develop and use the 2x2 matrices that represen t the effect 


of a translation, a 



B_I ,ens systems _ 

You should be able to calculate th e matrix for a syst e m of r e fracting surfaces or thin 
lenses between two given reference planes. 

Given such a lens system, you should be able to locate the principal planes and 
focal plan e s, us e them for ray tracing, and locate the image of a given object. 


C Hamiltonian optics 

For a Gaussian optical system, you should know how to write down the 
Hamiltonian point characteristic between two reference planes and to use it to 
determine what ray connects a pair of points in the two planes. 


Exercises 

9.1. Figure 9.20 shows the focal planes and principal planes for a thick lens. 
Rays incident from the left which are parallel to the axis are refracted so 
that they pass through a focal point in the plane F 2 , while rays emanating 
fr om th e focal point in the plane F t are refracted so that they emerge 
parallel to the axis. Principal planes H 1 and H 2 are associated with F x and 
F 2 respectively. 













. an up ics 


(a) By ray tracing on the diagram, locate the image of the object in the 
plane z t . Trace the ray plus two other rays. 

(b) Use Newton’s equation to calculate the position of the image which 

you located in (a). Specify the location of this image with respect to one 
of the planes in figure 9.20 _ 

(c) Construct th e mat rix of the system between planes z t and z 2 . Use this 
matrix to determine the position and slope of ray R j as it emerges from 
the lens at z 2 . 



9.2. The thick lens shown in figure 9.21 is made of glass with n = f. Construct 
the m atrix between reference planes z : and z 2 . Locate the focal planes 
and F 2 and principal planes and H 2 . and show them on a diagram. B y 
tracing rays on the diagram, locate the image of an object located 1 cm to 
the left of Zj, and check your result by using Newton’s equation. 

6 cm 


V 

I R 1 = 4 cm I/? 1= 6 cm 

Figure 9.21 

9.3. Suppose that you take ray tracing as the fundamental characterization of 
the properties of a thin lens; i.e., you assume that the intersection of a ray 
through the center of the lens with a ray which is parallel to the axis on the 
left and is bent through the focal point on the right determines the 
intersection of all the rays from a given object. 

(a) Derive the thin lens equation from this assumption. Consider only the 
case where p, q and / are all positive. 

(b) Prove from the same assumptions that a thin lens can be represented 
by a 2 x 2 matrix, and derive the form of this matrix. 

9.4. A crystal ball of radius 6 cm is made of glass with index of refraction f. For 
_rays which are close to a diameter, this crystal ball behaves like a linear 

thick lens (i.e., a cylindrical core, with a diameter as its axis, is just a thick 
lens). Construct th e matrix for this l e ns betw e en the referenc e plan e s z t and 
z 2 , between the focal planes, and between the principal planes. Draw a 
diagram showing all these planes. 
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Figure 9.22 


/= 12 cm 


/= 8 cm 



i 1 
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<'igure 9.2: 


- matrix - b e tween planes z t and z 2 .- 

(b) Locate the focal planes zj and z' 2 , both by using the thin lens equation and 
by using the matrix for the system, as described in the notes. Construct the 


(c) Locate the principal planes H l and H 2 and construct the matrix between 
them. The easy way to do this is to use the fact that the focal lengtlfis/ = 6 
an d that each princi pal plane is therefore 6 cm aw ay from th e corres pond¬ 
ing fo cal plan e. Notice that both principal planes lie between the two 
lenses, and that lies to the right of H 2 in this case. 

(d) Make a diagram of this optical system, showing the focal planes and 
principal planes. 

(e) Let z 3 be the plane 12 cm to the left of z l . Find the plane conjugate to this 
plane in four ways: by matrix multiplication, by using the thin lens 
equation twice, by using Newton’s equation x 1 x 2 =/ 2 , and by ray tracing. 

9.6. A lens system consists of two thin lenses, whose focal lengths are/j and f 2 


h 


t 




respectively, mounted a distance t apart. The first focal plane is located at a 
distance / to the left of lens 1, the second focal plane is located a distance l 2 
to the right of lens 2. Prove that the focal length f of this system satisfies the 
equation/ 2 - tf-lj 7 = 0, Bearinginmindthat/ 1 ./ 2 ./ v /' 1 and/, all make 
sense even if they are negative, HeniHe whieh root of this quadratic 
equation is physically meaningfu l_ 

9.7. Inv e nt a syst e m of thin l e ns e s whose optical mat r ix is the identity mat r ix 
(Note: this takes several lenses. You might wish to start by 

constructing a system whose matrix M satisfies M 2 — I.) 

9.8. A ray enters the optical system shown in figure 9.25 at z l with coordinates 

(^0 = ( 2 )' coor ^ nates the outgoing ray at z 2 . 




Figure 9.25 



9.9. The thick lens shown in figure 9.26 is made of glass with index of refraction 


n = f. 


(a) Construct th e matrix between reference planes z 1 an d z 2 . 


(b) Determine what incoming ray ( ^ ) is transformed into the outgoing 

\PiJ 

(to} /T\ , 

ray ^ J = ^ j at plane z 2 . 


9.10. The converging lens shown in figure 9.27 has/ = 10 cm. It is made of glass 
with n = 1.4, and its two convex surfaces both have the same radius of 
curvature R. 

(a) Calculate R, and determine the thickness b of the lens as a function of 
the distance q from the axis. (Note: b{2) = 0.) 

(b) A ray from A will follow the path ACF. Show that this path requires a 
minimum time compared with any path which passes through the lens 
at a different value of q. 

(c) Show that th e path ACB r e quir e s greater time than any other path 
f r om A to B via the lens. 

(d) Write down the function W(q A ,q F ) for the planes of A and F, and 




-30 cm- 



-30 cm 


Figure 9.27 Not to scale. 


show that Hamilton’s equations give the correct slopes for the ray with 
q A — — 0- Do the same for the planes of A and B. Finally, use 

W(q A , q B ) to determine what ray passes through the axis in the planes 
of both A and B. 


Gaussian optical syste m. The symp lectic scalar product of these vecto rs is 
defined by co(v 1 ,v 1 ) = q 1 p 1 - q,p,._ 

(a) Show that this scalar product is preserved by theaction of the optical 
system: i.e,, co(v 2 , v 2 ) = 0 )^, v L ). 

(b) Show that ro(vt, v t ) = 0 if v t , v : denote rays which meet anywhere on 


Suppose that two rays pass through the same point q x in reference 
plane z t , with an angle between them. If these rays nieet in the 
conj ugate plan e z 2 with angle (f> 2 between them, what is t heir distance 
q 7 from the axis? (Assume n— 1 at planes z, and z 2 .) 
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In Chapter 10 we go back and prove the basic facts about 
finite-dimension al vec tor spaces and their linear transform- 





new concepts 


(somewhat hard to g e t us e d to at first) are introduced: those of 
the dual space and the quotient space. These concepts will 
p r ove c r ucial in what follows. — 


We have worked extensively with two - dimens i onal vec t or spac es, but s o far alwa ys 
with one of two specific models in mind. A vector space V was either the set of 
displacements in an affine plane, or it was [R 2 , the set of ordered pairs of real 
numbers. By introducing coordinates, we were able to identify any two-dimensional 
vector space with [R 2 and thereby to represent any linear transformation of the 
space by a 2 x 2 matrix. 

We shall now begin to view more general vector spaces from an abstract and 
axiomatic point of view. The advantage of this approach is that it will permit us 
to consider vector spaces that are not defined either in geometrical terms or as 
n-tuples of real numbers. It will turn out that any such vector space containing 
only a finite number of linearly independent elements can be identified with IR” for 
some integer n so that eventually we shall return to the study of IR” and the use of 
matrices to represent linear transformations. In what follows, you should keep in 
mind the familiar two-dimensional geometrical model of a vector space in order 




Properties of vector spaces 341 


to remind yourself that the definitions and a xioms are reason a ble . The emphasis 
in the examples, however, will be on vector spaces that do not arise in a geometrical 
context. Such vector spaces are part of the natural mathematical language of many 
br anches of physics, notably electromagnetic theory and quantum mecha ni cs. 


10.1. Properties of vector spaces 


We begin by repeating the basic definitions. 

A vector space, also known as a linear space, consists of a set of elements called 
vectors which satisfy certain axioms listed below. We shall denote vector spaces 
by capital letters, e.g., V,W*,C l , and elements by lower-case bold letters, e.g., 
v,w 2 ,b\ 

Part of the characterization of a vector space V is a rule that assigns to any 
two elements v x and v 2 a unique third element v, usually called the sum and denoted 
Vi + v 2 . This operation satisfies the same axioms as addition of real numbers: 


Commutative law: v , + v 9 = v 2 + y 1 . 


mu 




m?r 


E xis tence of zero: There is an element 0 such that v + 0 = v for all v. 


«- 


Existence of negative: Tor any v there is a n element — v such that — v + 
v — 0. I (10.4) 


In some cases the operation 
of real 


directly in 



For example, in R 2 we define 


1 


l + l 



( a l + bA 


• 

K a 2 ) 



I ! 

V a 2 + b 2 ) 



Similarly, we might consider the two-dimensional vector space of all functions 
defined on a two-element set, {A, B}, with addition defined pointwise, so that 
h = f + g is the function with the property that 

h(A) = f(A) + g (A), 

h(j3)= m + g (B). 

As a final example, we might consider the space of all continuous functions on 
the interval [0,1], with addition again defined pointwise, so that if f and g are 
elements of the space, their sum is the function h given by h(x) = f(x) + g(x). In this 
case it is crucial to notice that, for any f and g, the sum h is also a continuous 
function and so lies in the vector space. 

In all these examples it is clear that the zero element and inverse element are 
unique. In fact this is true in any vector space, but it need not be assumed, s i nce 
it is easily proved from the axioms. The proofs are left to the reader. 

The other operation that must be defined as part of the characterization of a 
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by complex numbers. Again the axioms are those of ordinary r 
that, if c 1 and c 2 are scalars and v, and v 9 are vectors, we have 


tion, so 


issociatv 


= IC'i Colv. 


c,v + c 2 v, 


( 10 . 6 ) 


4 ., .. , j(Ci +C 2 )v = c 1 v + c 2 v,{ 

Distributive laws: < , >. (10.6) 

lc(Vi + v 2 ) = c\ 1 + c\ 2 j 

Multiplication by 1 is the identity: lv = v for all v. (10.7) 

Because the axioms of addition and scalar multiplication in any vector space 
are the same as in ordinary arithmetic, almost any property which is true in 
arithmetic is also true in vector algebra. Here is a list of such properties, all readily 
provable from the axioms. Think about these, and convince yourself that they 
really require proof: they are not true just by definition. 

(a) Ov = 0 

(b) cO = 0 

_ (c) (- c)v = - (cv) = c( - v) _ 

-(d)-v--Kv = 2v, v + v + v = 3v, etc. 

_(e) If a\ = 0 then either a = 0 or v = 0 

(f) — (v + w) = — v ~|— w 


10.2. The dual space 

Given any vector space F, we can consider the set of all linear junctions from V 

to R These for m a ve ctor spac e, calle d th e dua l spac e V*, a s we shall now s how._ 

We shall denote elements of V* by bold Greek letters and also, introducing a 
convention which will be useful later on, identify them by superscripts rather than 
by subscripts. Thus v l5 v 2 ,... are elements of V, while a 1 , a 2 ,... are elements of V*. 
The action of an element V* on an element of V will be denoted by using square 
brackets, e.g., 

We define the sum of two elements of V* in the usual manner for functions: 
i.e., for any ve V, (a 1 + a 2 )[v] = a^v] + a 2 [v]. Since the sum of linear functions is 
is also a linear function, a 1 + a 2 is indeed an element of V*, and it is easy to 
see that all the addition axioms (10.1)—(10.4) are satisfied, with the zero element in 
V* being the zero function, which is certainly linear. Similarly, we define scalar 
multiplication by 

(cor)[v] = c(a[v]) 

and thus see immediately that cat is linear and that the axioms for multiplication 



mg variety of wavs in which elements of a dual space may be defined. Here are 

some examnles- 
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1. V is [R 2 , with a typiral element V = 

. Then V* mav be identified with 
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ax + by. 


2. V is the space of all functions on the two-element set { A,B }. Then the rule 
oc A : which assigns to an element ieV its value on the element A, so that 

a^tf] =f(A), is an element of V*. In this case, in fact, the general element of V* 
is of the form 


*[f] = af{A) + bf(B) 


for arbitrary a and b. What is interesting about this example is that we have 
identified A with <x A and similarly can identify B with a B . Although an expression 
like ‘ aA + bB ’ makes no sense, aa A + btx B makes perfect sense as an element of V*. 
Thus we have a procedure for associating a vector space to any finite set so that 
the elements of the set become vectors: just take the dual space of the space of 
function s on the se t! Thi s constructi on will prove useful i n th e theory o f e lectri c 
networks. 


3. V is the space of differentiable functions fit) on the interval TO, 11. Then all 

the following are elements of V* 
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10.3. Subspaces 

Frequently a vector space W arises as a subspace of a larger vector space V with 
addition and scalar multiplication defined in W just as in V. In such a case, since 
V is known to satisfy all the vector space axioms, there is no need to check them 
for W. All that must be done to confirm that IT is a vector space is to show that 
it is closed under addition and multiplication; i.e., that for any w l5 w 2 elT, the sum 
Wi + w 2 is an element of W, and, for any real number c and any we IT, cw is an 
element of IT. In particular, the zero vector must be an element of IT 
In practice, subspaces are usuall y d efined by one of two methods, either b y 
specifying a set of el e ments of V or a s e t of elements of V*. 




Method 1 . Let w,, w 2 ,..., w fe be vectors in V. Then the set of all linear combinations 


5 5 • • * 5 ' 


satisfying 


a 1 [v] = 0, 
a 2 [v] = 0, 


a fe [v] = 0 

is a subspace of V. The proof is simple. Let Wj and w 2 be two vectors in this set 
W. Then, because the functions a 1 , a 2 , ...are all linear, 

oc i \jw 1 + w 2 ] =a i [w 1 ] + flf ; [w 2 ] = 0 i= 1,2,..., k 

so that + w 2 elT. Similarly, 

a‘[cw] = co f'[w] = 0 

s o tha t cwe W. T hus W i s closed un der addit ion and scalar multiplication and is 
a subspace. (It may, of course, be {0} - the zero subspace consisting of 0 above.) 

A familiar e xample of these two methods is the construction of a plane (through 
the origin) in IK^TMethod 1 des cribes the plane in terms of two vectors that span 
it; e.g., - 

/A /A m - /^ r \ 

I 1 1 and I 1 1, or 1 2 and _0 . 

Vo/ Vv W \ V 

Method 2 describes the plane by means of a linear eq u ation, e.g., _ 

2x — 2 y + z = 0, 

which is the same as saying that or[w] = 0 where 


a—(2, — 2,1) and w = [ y 


As another example, consider the space V of polynomial functions of degree 
^ 2, with a typical element 

f(t) = a + bt + ct 2 . 

A one-dimensional subspace W can be described by method 1 as the space of all 
constant multiples of the function 1 — t 2 . The same subspace can alternatively be 
described by method 2 in terms of the two conditions 
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10.4. Dimension and basis 

To proceed further with the study of vector spaces, we need the notions of linear 
dependence and linear independence of a set of vectors. A set of vectors {v l5 v 2 ,. .., v fc } 
is said to be linearly dependent if there exist real numbers 2 l ,X 2 ,... ,X k , not all 
zero, such that 

Vl + ^2 V 2 + "• + ^-fcVfc = 0. 

If this equation holds only for = X 2 — ••• = X k — 0, then the set of vectors is said 
to be linearly independent. 

Here are some examples of these important concepts: 

1. Let V be IR 3 , and consider 


The set {v 1 ,v 2 ,v 3 } is linearly dependent because 

2vi + V, - vT^fPl+fr) 


v 

i /2\ 

as 

l-M 


)n tne ot 


Vfhe set {v l5 "v 2 } is linearly me 


A 1 1 -r ' £ f — \ A 1 T +‘**2 I 

W W \ K ) 

and it is apparent on inspection that this last vector can only be zero if X x = 0 
and X 2 = 0. 

2. Let V be the space of functions on [0, 2tl] and consider 

v x = cos 2 t, v 2 = sin 2 t, v 3 = cos 2 1. 

This set of vectors is linearly dependent because Vj — v 2 — v 3 = 0. 

3. Let V be the space of functions on the set {A, B], and consider 

* f 1 :/ 1 (A)=l,/ 1 (5) = 2, 

i 2 :f 2 (A) = 2,f 2 (B)=-3, 


function). 


imir 





elements of V*: 


or. /’—> 

2- 

t f (t \ dt 


-2 


ft /- > /'«»■ 


Writin g f(t) = A + Bt + Ct 2 we find 


and 


*[f] = 


'2 - 

(/lt + Bt 2 + a 3 )dt = J #B 

J ~2 


^Cf] = B 


so (a — -x/T}[f] = 0 and the set { <x,fi } is linearly dependent. 

It is probably clear from these examples that there are situations in which it 
may not be apparent on inspection whether a set of vectors is linearly dependent 
or independent. We shall have to develop a systematic procedure for investigating 
this question. 

We say that a set of vectors {v 1 ,v 2 ,...,v fc } spans a vector space V if any vector 
veK can be written as a linear combination Y (The set {v 1 ,...,v fc } may 
be linearly dependent, in which case the coefficients p l ,...,p k are not uniquely 




also spans IR 3 , but the set 



does not. 


2. Let V be the space of functions f(t) on [0, oo) which satisfy the differential 




The vectors e~* and e~ 2t span V, because the general solution to the equation is 
of the form _ 

f (t) = Ac , + Be~ 2t . 




tf(t) d t 


span the dual space V*. Clearly 

a[f] =A, 

'i 

fl[f] = (2At + Bt 4 )dt=jB. 

J -i 

But any element yeV* must be of the form y[f] = a^4 + bB for some constants 
a and b. Thus y = aa + jbp, and ct and fi span V*. 

Let vv„ be a finite set of linearly independent vectors that spans a vector 
space V. The number n of vectors in such a collection is called the dimension of 
K To establish that dimension is a well-defined integer; i.e., that all such sets for 


result: 


Theorem. Let {v 1 ,v 2 ,...,v fc } be a set of vectors that span a vector space V. 

Then a ny se t of k + 1 vector s in V is l i nearl y dependent. _ 

The proof is by induction: we first establish the result for k= 1; then we show 
that if it is true for a space spanned by /c — 1 vectors, it is true for a space 
spanned by k vec tors. When k = b the theorem st ates that, if V is spanned by one 


that Wt = /i i v a nd w 2 = ^ 2 v. Clearly, then, 

~ThW i ~ 

so that and w 2 are linearly dependent. 

We now assume that the theorem is true for any set of k vectors in a space 
spanned by k — 1 vectors, and we consider a set of k + 1 vectors, {w 1 ,...,w k + 1 }, 
in a space spanned by We can write Wj ==a 11 v 1 + a 12 v 2 + •••+ a lfc v fc 

because the vectors {vj span V. If = 0, then rwj = 0 with r # 0 gives a non-trivial 
relation among the ws and there is nothing further to prove. So we may assume 
that ^ 0 and hence we may as well assume that we have ordered the vectors 
{v 1 ,v 2 ,...,v k } so that a ll #0. Thus 

l Cl 1 9 Cl 1 Jr 

— W l =v 1 + — \ 2 + ---+ — v k . 


But 




Thus 


a 


21 


^21^12 


a kl a lk 


W 2 - —Wi = a 22 
«n \ 


a 


V 2 H-1" «2fe 


a 


1 i 


IX 


and similarly we can express 


a 




31 


etc. 


w 


a 


15 


nr 


in terms of the k — 1 vectors {v 2 ,..., v k ). But we are assuming that the theorem is 
true for k — 1 vectors, so that the set of k vectors 


a 


w- 


21 


a 


w l5 w 3 


31 


a, 


Wi,...,w fc+1 


(fc+ i)i 


w, 


a ll a ll a ll 


is linearly dependent. Thus there exist constants A 2 ,..., l k , not all zero, such that 

a 2l 




a i 


■Wj + 1 3 w 2 


a 31 \ , . . , i ( ... a (k+ 1)1 


ill / \ a ll / \ “11 / 

But this means that {w l5 w 2 ,..., w k + J is a linearly dependent set, as we wished to 
show. 

Now we can easily show that the dimension of a vector space is well-defined. 
Supp ose we have, in a vector sp ace V, a colle ction {v ,,.. , ,\ k ] which is linearly 


W! + • • • + A k w k+1 


w, 


a , 


0. 


By the theorem 

just proved, n^k, otherwise the vectors {w l5 ..., w„} would be linearly dependent. 
By the same argument /c< n, ot herwise would be linearly dependent. 

We conclude that k = n, so that any finite-collection- of linearly independent and 
spanning vectors in a vector space contains the same number of vectors. Hence 
we have the right to call this number the dimension of V7~ 


In fact, i n a vecto r s pace V o f d imension n, any set of n independent vectors 
spans the spa ce. L e t (v L ,..., v „| be a set of independent vectors in V and let w b e 


an arbitrary non-zero vector. The set {w, v l5 ..., v„}, which contains n + 1 vectors, 


must be linearly dependent, so there exist constants l 0 , ?, x ,..., ?, n such that 


A 0 w + iijVj + • • • + A„v„ = 0. 

Now A 0 cannot be zero; otherwise {v l5 ...,v„} would be dependent, contrary to 
hypothesis. Hence we can write 


w= - Wli Vi + ••• + ^v„) 

Aq 

and we have expressed the arbitrary non-zero vector w as a linear combination 
of {v l5 ..., v„}, which therefore spans. 

A linearly independent and spanning collection of vectors, {v 1 ,...,v„}, when 
written in a specified order is called a basis of V. Thus {v 1 ,v 2 ,...,v„} is a different 
basis from {v 2 ,v 1 ,v 3 ,...,v„}. 

Starting with fewer than n independen t v ectors in an n-dimensional spac e V , 
say w t ,..., w fc (k < n ), w e can always find a v e ctor v k + T which is not in the subspac e 
spanned by {w 1; ...,w fc ). Continuing this process for n — k steps, we eventually 





arrive at a basis for V which includes the vectors w 1 ,...,w fc . In particular, given 
a vector spare V nf d i mensi o n * with a subspace W of dimension k. we can always 
construct a basis for V in which the first k vectors form a basis for W. This process 
is called extending a basis for the subspace W to a basis for the entire space V 
Once we have chosen a basis, say {e 1? ..., e„), for a vector space V, we can write 
any element of V uniquely as a linear combination of basis vectors 

v = x 1 e 1 + x 2 e 2 + • • • + x„e„. 

The numbers are called the components of v with respect to the given 

basis. To show that they are uniquely determined, we imagine that v can be 
expressed alternatively as 

v = + y 2 e 2 + • • • + y„e„. 

Then, subtracting, we have 

o = (*! - yi) e i + (*2 - y 2 )e 2 + • • • + (x„ - y„)e„. 

But, since the basis elements are linearly independent, x t — y 1 = x 2 — y 2 = • • ■ = 
x„ — y„ = 0 which proves the uniqueness of the components. 

Thus, a basis determines an isomorphism, L, of V with R", where 


/A- -M 





i: 


L\ v = 


. Lvn = 

i 

etc 


i ; 





0 ! 
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Conversely^if L is such an isomorphism, then 

/i\ 

_/n\ 





1 _ 


v. = rr l 

u 

= tr 1 

1 

, etc. 


i • j 


IT 
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th 






is a oasis v. we may thus identijy a basis (vwith the corresponding 


isom or phism L, just as we did in Chapter 1 in the two-d i mensional case. 


Let L: V-> R" and L': V-> R" be two bases of the same n-dimensional space, V. 


v 



R"-► U n 

B 


Then B = L°L~ l is a linear isomorphism of R”-»R”, hence an invertible n x n 
matrix. It is called the change of basis matrix. 

Let V be a vector space of dimension k and W a vector space of dimension l. 
Let T: V-+ IT be a linear transformation. Suppose that we choose bases of V and 
of W. So we have isomorphisms L: V-* R k and M: W-* R and we can define the map 

MTL" x 'M k -+U l . 

We can r e gard MTL -1 as a matrix with l rows and k columns. We call MTL 1 



the matrix of T relative to the bases L and M, and denote it by Mat^^T). So 


M2Lt LtM jT) = MTL~ 1 . 

We can picture the situation b y t he diagram 


T 

V — m W 








L 


M 


^ a h,Af LT) 

If we make a different choice L' = PL of basis on V and AT = gM of basis on IF, 
then 

L'" 1 = L~ l P~ l 
so 

M’TL~ X =QMTL~ 1 P~ 1 
or 

Mat L%M -(T) = (2(Mat jLiM (T))p- 1 

when 

7^ L = PL, M' = QM 

is the change of basis formula. It tells us how the matrix representation of a linear 
transformation changes when we change the bas is. 

10.5. The dual basis 

Having constructed a basis for a vector space Kwe can readily construct a dual basis 
for the dual space V*. Let be a basis for V. Then any vector veF 

can be written uniquely in th e form 

_ v = + x 2 e 2 + •• • +x„e„. _ 

Now let a be an element of V*. Since a is a linear function on V, 


<*[v] = Xifltej] + x 2 a[e 2 ] + — + *„a[ej. 

This means that a is determined completely by its values on the basis vectors 
{e 1 ,...,e„}. We therefore introduce vectors s 1 ,...,£ n in V* with the property that 



1 if i =j, 
0 if i ‘ 


To prove that the elements s 1 are linearly independent, we consider Applying 
this to an arbitrary basis element e^, we obtain 


X VCeJ = 

i= 1 


Thus , if is the zero element in V*, Xj = 0 for all j. This proves that the set 
fc 1 ,•••,£”} is 




Now, given any aeF*, we write 

af = a[e 1 ]£ 1 +«[e 2 ]g 2 + --- + «[e„]g w . 

Clearly both sides of this expression have the same value on any basis elements e,- 
and so are the same element of V*. This proves that the elements s \ 7.., g* span F*7 
Since these elements are also independent, we conclude that V* is also n - 
dimensional and {g 1 ,.form a basis for it. 

We can use this basis to identify V* with R"*. When we express an element are V* 
in terms of the dual basis: 


or — A ^£* X 2 X n £ n 


we find it convenient to identify elements of R"* as row vectors. So a becomes 
identified with the row vector (X l ,X 2 ,...,X n ). An advantage of this notation is that 
the action of or on v is then described by the usual rule for multiplying matrices: 


M 


«[v] = (Ai ,X 2 ,...,X n ) 


x- 


— + X 2 x 2 + ••• + X„x„. 


\ x «l 


It is im portant to bear in mind tha t this techniq ue is correct only if the identification 
of V and V* has been done consistently: the basis used in T/ * 

must be dual to the basis used in identifying V with fR". 


Sup pose now that we have an n -dimens i onal sp ace V with a /c-di mens ional 
subspace W. We can choose a basis for V in which the first k vectors form a basis for 



a '[Vj] = 



if i=j, 
otherwise’ 


The ( n — /c)-dimensional subspace spanned by {a fc+ \..., a"} is called the annihilator 
space of W, denoted W L . It derives its name from the fact that if ore W 1 and w e W, 
then a[w] = 0; that is, W 1 ‘annihilates’ the subspace W. What was earlier called 
method 2 for describing a subspace was in fact a specification in terms of the 
annihilator space. For example, the vector ( a,b,c ) defines a one-dimensional 
subspace W 1 of the dual of R 3 . The subspace W of R 3 annihilated by W 1 is two- 
dimensional: it is the plane ax + by + cz = 0. If we specify two independent elements 
of the dual of R 3 , (a 1 ,b l ,c 1 ) and (a 2 , b 2 ,c 2 ), then the subspace of R 3 annihilated by 
these is one-dimensional: it is the line which satisfies the pair of equations 
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e ror calculating 

spanned by specified elements of a vector space, or of a subspace annihilated 

of 




10.6. Quotient spaces 

We continue to consider an rc-dimensional vector space V with a subspace W of 
dimension k. 

It seems reasonable that there should be a space of dimension n — k which is in 
some sense the ‘difference’ between V and W. This space is called the quotient space 
V/W. Its elements are not elements of V, however; they are sets of elements of V called 
equivalence classes. Before defining these classes, we should first see why something 
simpler will not suffice. 

For a concrete example of a vector space V with subspace W, we can take V to be 
the plane IR 2 and W a line in the plane, as depicted in figure 10.1. One possibility for 
forming the ‘difference’ between V and W would be to consider the set of elements of 
V which are not in W. Alas, these span the entire space V; for example, in figure 10.1, 
the vectors y 1 and v 2 , neither of which i s in W, clearl y s pan the entire plane. 

- Another possibility would be to choose a basis of k vectors for W, extend it to a 
basis for V, and form the subspace which is spanned by the n - k basis vectors which 
are not in W. This gives a subspa ce of the desired d imension, but one w hich depends 
on arbitrary choice of basis elements and so is not well-de fined. F or example in 



figure 10.1, we select w as the first basis vector, and we could then choose v 1 , v-,, or v 3 
as a second basis vector, obtaining quite a different subsp a ce with each choice. If 
there were a scalar product defined on V. we could select the subspace orthogonal to 
W, but, lacking a scalar product, there is no way to prescribe a choice of the second 
basis element. 

The construction which works is to define equivalence classes (modulo W), e ach 
consisting of a set of vectors in V whose differences all lie in W. We denote the 
equivalence class of a vector by writing a bar over it; thus, for example, v denotes the 
set of all vectors of the form v + w, where v is a specified element of V and w is an 
arbitrary element of W. 



Figure 10.2 


Referring to figure 10.2, we see, for example, that 0, the equivalence class of the 
zero vector, is the subspace W, a line through the origin. The vectors v x and v 2 , which 
d iffe r by an element of b elong t o the same equivalence class, whi ch we may deno te 
?! or v 2 . This e quival e nc e class is a lin e wh i ch do e s not pass through the or i gin. The 
equivalence class v 3 is a different line, again not passing through the origin. In this 
case the equivalence classes are a family of lines parallel to W. More generally, we 
can view a subspace IT as a k-dimensional hyper plane through the origin of V and the 
equivalence classes modulo W as a family of hyperplanes parallel to this one. 

To introduce the operation of addition of equivalence classes, we look first at the 
arithmetic of the integers modulo 4, with which you are probably familiar. Here there 
are four equivalence classes: 

0 = {0,4, — 4, 8, — 8,...} = {4n}, 

1 = {1,5, —3, 9,-7,...} = {4n+l), 

2 = (2,6, — 2,10, — 6,...} = {An + 2}, 

3 = {3,7, —1,11, ^5,...} = {4w + 3|. 

To add two equival e nc e class e s, w e s e lect any intege r f r om each class, add these 





together and then find the class to which the sum belongs. For example, to add 2 and 



uciuiigo tu me ciubo i. ou z r j ~ x. ijincc any otncr liivilc ~z i i — 

— 3) would have led to the same conclusion, this operation of addition is well 

defined. 



similarly, we simpiy maice me 
from the classes v, and v 2 , anc 
choose Vj + Wj from v, and v 2 
of W. Then the sum + v 2 is 1 
which is (\ 1 + v 2 ), no matter 
This operation of addition i 
that v 1 +v 2 =v 3 , no matter 
representatives of the classes 

aeiinmon Vj + v 2 = (Vj + v 2 j; i.e., aaa any two vectors 

1 find the class to which the sum belongs. Suppose we 
+ w 2 from v 2 , where w 1 and w 2 are arbitrary elements 
:he equivalence class containing (\ 1 + v 2 ) + (w t + w 2 ); 

what choice of and w 2 may have been made, 
is illustrated geometrically in figure 10.3. The point is 
whether v x and v 2 or Uj and u 2 are chosen as 

Vj and v 2 . 

w 

/ / / 2 *3 = Vl + V 2 

/ / / / 
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We define multiplication o 

cv j = (cvj ). That is, multiply ai 
the result. Because W is a sul 

! 

Figure 10.3 

f an equivalence class by a scalar in a similar way: 
ny element of \ t by c, and take the equivalence class of 
Dspace, the result is unique. 
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Vi, v 2 ,..(not elements of W). We claim that the equivalence classes 
form a basis for V/W. To prove this, we must show that they are 
i ndependent and that they span V/W. 

Let us deal first with the question of independence. Suppose that v 1; 
were not independent. Then constants A 1 ,A 2 ,...,A n _ k exist such that 


which implies that 


^l v i + ^2 V 2 + ••• + A n _ k \ n _ k = 0 

Vi + ^2 V 2 + ••• + A„- k \„- k eW 


contradicting the assumption that the set of vectors {w 1 ,...,w k ;v 1 ,...,v„_ fc } is 
linearly independent. 

We can write a vector veFas a linear combination of basis elements: 
v = XjVj + x 2 v 2 + ••• + x„- k \„_ k + element of W 
which implies that _ 

V = X , V t +X 2 V 2 + ••• +X„_ fc V„_ fc . 

This proves that the equivalence classes v 1 ,v 2 ,...,v„_ k span the space V/W. We 
conclude that v^V ;,. v„_ fc for m a basis for V/ W.\ and that 

dim(K/H / ) = dim V— dim W. 

The time has come for some examples of quotient spaces^ - 


Example 1. V is U 3 , IT is the one-dimensional subspace spanned by 1 J. Since 

f'\ M —M-M—M- 

I 1 Kill, and I 0 I span R 3 , we can choose I 1 I and I 0 I as a basis for the two- 

W \o/ \i / \o/ W 

dimensional quotient space V/W. Now, for example, 

(:)- 3 0- 2 ©*0 


0)-<D + © 

rn a similar manner we can express the equivalence class of any ve 



M 


Notic e , incid e ntally, that ( 0 ) and 

, 0 , 


would not serve as a basis for V/W. 


w 


B ecause their sum is an element of W, they are not linearly independent element s of 

V/W: _ 


0+1 = 1 
,0/ Vo/ Vo, 


so I 0 1 + I 1 1 = 0. 

, 0 / Vo, 


Example 2. V is the space of polynomials f(t ) of degree <2; W is the 
two-dimensional subspace of such polynomials satisfying the additional condition 
/(1) = 0. A basis for W is f{t)=\ — t and f 2 (t) = 1 — t 2 . A basis for V/W is the 
equivalence class T. In this case, the general element of V is 

f(t) = A + Bt+ Ct 2 


so 


f{t ) = A + B + C - B(1 - t) - C(1 - t 2 ). 


This means that 




If you think of elements of V/W'&s planes, this is obvious: the subspace ITis the plane 
/(l) = 0, and the equivalence class of any other function f(t) is determined by the 
value of/(l). 


We can now put together the concepts of dual space and quo ti ent space to obtain 
a powerful result. Earlier, we found that, if Wis a subspace of V, the annihilator space 
W 1 is a subspace of V*. Now suppose that a is an element of W 1 , and consider its 


action on an e qui valence clas s v. Because orlAv] = 0 for all we I T and a is linear. 

a[v + w] = or[v]. 


That is. n has th e sam e valu e on anv e l e m e nt in a e onivalcncc class, and it can 


therefore be regarded as a linear function on the space of equivalence classes. 
Conversely, any linear function on V/W can be regarded as a linear function on 
V. Simply define /?[v] as 

/f[v]=/?[v] 

Then /l[w] = /?[0] = 0. So /? is an element of W ± . We can therefore 
identify W L with the dual space of the quotient space V/W. Recall that both W 1 and 
V/W have dimension equal to dim V — dim W. 

Similarly, we may consider the quotient space V*/W i ~, whose elements are 
equivalence classes jf whose elements differ by elements of W 1 , i.e., 


fi = {p + a:<xeW 1 } 






w eW. This says that fie^W 1 or fi = 0- So 0 is completely determined by the linear 
function it defines on W. 

Conversely, we claim that every yeW* is of the form y = fi for some p e V*. 
Indeed choose a basis of W and extend il to a basis w 1? ...,w k , v l5 ...,v„_ fc 

of Vo- Z 

Let fi be any linear function with /?[>,] = y[w t ] for all i an d let fi take any 
values on the vs. Then p= y. We can therefore identify the space of these equivalence 
classes, the quotient space V*/W 1 , with the dual space W*. 

The results just proved may be summarized in the following diagram: 

V*/W ± <-V*<-W ± 

W-+V -> v/w 


Here the spaces which are dual to one another are arranged vertically: V and V* 
(dimension n) are dual, W and V*/W L (dimension k) are dual, V/ W and W 1 
(dimension n — k) are dual. 

Much of linear algebra and its applications to electric network theory rests on this 
single theorem, which deserves your most careful consideration. 

As an illustration of the theorem, let Vbe the space of polynomials /(f) of degree 
< 2, and le t W be the two-dimensi onal subspace of even polynomials. Then V/W is 


one-c 


we shall call h 0 , is i 
of the function f(t) = t. Thus if f(t) = A + Bt + Ct 2 , f{t ) = J3h 0 . 


In this ca se, the annihilator space W 1 is also one-dimensional. One choice for a 
basis element is the linear operator 


<z:/(t)h-> 


tf(t)dt. 


- i 


Since 


t(A + Bt + Ct 2 )dt = \B 

we see that a does indeed annihilate any even polynomial and assign the value f to 
the polynomial f(t) = t, which specifies the basis h 0 of V/W. That is, fa is the 
basis element dual to h 0 . 

We can extend a to a complete basis for V* by adjoining the basis elements 
and 

o) 

whose effect is to pick out the coefficients A and C respectively. That is, 

PilA + Bt + Ct 2 ] = A, 
fi 2 [A + Bt + Ct 2 ] = C. 

The equivalence classes fi L and fi 2 , which form a basis for V*/W L , c l ear l y also formji 





A: V-> W 

where Visa vector space of dimension m and ITis a space of dimension n. As always, 
to state that A is linear means that A(c 1 v 1 + c 2 v 2 ) = c^y^ + c 2 Ay 2 - 
Associated with a linear transformation A: V-> W are two subspaces, the kernel of 
A and the image of A. 

The kernel of A, denoted ker A, is the set of vectors ve V such that A\ = 0. To verify 
that ker A is a subspace of V, we note that, if v x and v 2 are in the kernel of A, then 
A(c 1 y 1 +c 2 v 2 ) = c i Ay 1 + c 2 A\ 2 ^sothatCjVi + c 2 v 2 eker A also. This proves that 
ker A is closed, and hence a subspace. 

The image of A, denoted im A , is the set of vectors we JYwhich are of the form Ay 
for some ve V. If Wi and w 2 are vectors in im A, then w x = Ay x and w 2 = Ay 2 for some 
v l5 v 2 eK Because of the linearity of A, 

Afe^ \ - c 2 v 2 ) = c^Ay^ V c 2 A \ 2 — + c 2 w 2 

so that CjW! + c 2 w 2 is also an element of im A. This proves that im.4 is a subspace 
of W. 

The dimensions of ker A and of im A are related by the equation 



The dimension of the image of A is called the rank of A, the dimension of the kernel of 
A is called the nullity of A, and equation (10.8) is called the rank-nullity theorem. 
You are already familiar with the theorem in the special case of transformations 



(1) A has the entire plane as its image and carries no non-zero vector into the 
origin (rank 2, nullity 0). 

(2) A collapses the plane into a line, and carries a line into the origin (rank 1, 
nullity 1). 

(3) A collapses the entire plane into the origin (rank 0, nullity 2). 

To prove the rank-nullity theorem we choose a convenient basis for V. Suppose 
that dim V — n, dim (ker A) = k. We choose a basis {u 1 ,u 2 ,...,u fc } for ker A, then 
extend this to a basis for all of V by choosing r = n — k vectors v l5 v 2 ,..., v r . For 
convenience, we order this basis as 

(v 1 ,v 2 ,...,v r ; u 1 ,u 2 ,...,uj 

so that the first r vectors in the basis do not lie in ker A. The problem is now to show 









We first show that the vectors \Ay x ,Ay 2 ,...,Ay r ) are linearly independent. 
Suppose t hat 


Y hAVi = 0. 




Because A is linear, this is the same as 


A Y ; -i v ; = 0 


£=i 


which implies that x ^v.-eker X. But the vectors (vj, along with the basis {iij} for 
ker A, form a basis for V. Therefore 

r 

Y ^i v i = u with ueker.4 

t = 1 

implies that all the A t are zero, and therefore that {Ay x ,Ay 2 ,...,Ay r } are linearly 
independent. 

To show that the vectors {Ay x ,Ay 2 ,...,Ay r } span im.4, we consider an arbitrary 
vector weimT. There is some vector yeV such that w = Ay. We can write 

r k 

v = X a i*i + I b J»r 
i=l j= 1 


But 




w = Z a i \ i + Y b j u j = I ^(Ay^ 

V~1 j=l- ) i- 1 


i .e., any w can be writte n as a linea r combination of [Ay x ,Ay 2 . Av r }. 


We conclude that the r vectors {Ay x ,A\ 2 ,. .., Tv r } form a basis for imA. It follows 
that dim(im^4) = r. But r = n — k, where dim V=n and dim (ker A) = k. Thus 


dim (im A) = di m V— dim (ker A) , which is the r ank-nullity theore m._ 

The rank-n ullity theorem provides a proof of a res ult wh ich you have probably 
already conjectured about the annihilator space of a subspace. Suppose that 
(a 1 , a 2 ,. ~ a”} are elements of V*. Then we can define a linear transformation 


A: V- 


by 


Ay = 


/«‘[v]\ 


The vectors (a 1 ,a 2 ,...,a”} span a subspace U 1 cz V* which annihilates the subspace 
ker A. The rank-nullity theorem says that dim (im A) = dim V — dim (ker A). But we 
saw in section 10.4 that 

dim (U 1 ) = dim V — dim (ker A). 


It follows that 


dim (im A) = dim l/ 1 . 




In terms of matrices, each element a 1 ' is a row of the matrix, and dim U L is the 
dimension of the subspace o f V* spanned by the rows of the matrix. On the other 
hand, dim (im A) is th e dim e nsion of the subspace of IFspann e d by th e columns of the 
matrix. Both of these numbers equal r, the rank of the matrix. 

Thi s view of the rows of a matrix as elements of the dual space V* is particu larly 
useful when we are trying to solve systems of linear equations. For example, the 
system of equations 


x + y + z = 0, 
x + 2y+3z = 0, 
2x + 3y 4- 4z = 0 


may be represented as Ay = 0 where 


A = 





Here the rows of A are associated with individual equations. Because the third 
equation is the sum of the first two, the three rows span only a two-dimensional 
subspace of V*, the rank of the m atrix A is 2 an d its nullity i s 3 — 2 = 1. Th erefo re 


utriviaL 


solutions to the equation Av = 0. 


10.8. Row reduct ion 

Consider now a linear transformation 


T: V-+W 


where V is m-dimensional, W is n-dimensional, and the rank of T is r. We have seen 
that by a pr oper choice of basis for V and W we can assure t hat T h as an especially 


simple matrix representation. We simply choose as a basis for V the vectors 


{v 1 ,v 2 ,...,v r ; v r+1 ,...,v m } 


where the last m — r basis vectors form a basis for ker T, so that TV r+1 = 
T\ r + 2 — = T\ m — 0. Then Tv l5 7> 2 ,..., T\ r form a basis for im T. We choose 

Wi = Tv 1 ,w 2 = 7v 2 ,..., w r = T\ r as a basis for im T, then extend to a basis for all of 
W. Now the matrix representation of T relative to this basis is simply the matrix 

(o ^jnrows (10.9) 

m^coiumns 


which has a string of r Is down the diagonal from the upper left-hand corner and all 
its other entries zero. 

U s u a lly, alas, the transformation Tis described by a matrix A which represents it 
relative to some other, less convenient basis. An important computational problem 




IS then to find the change of basis for V and W which converts the matrix 
repre sentation relative to the given basis. A, to I r . In practice this is most efficient ly 
achieved by the algorithm of row reduction, which is in essence just a systematic 
procedure for solving linear equations by the familiar process of elimination. We 
first describe the process and illustrate it, then explain why it solves the genera l 
problem._ 

Suppose we are given the matrix 

/0 4 -4 8\ 

M = l 2 4 0 2 . 

\3 0 6 -9/ 

The index of any non-zero row of M is the position of the first non-vanishing 
entry, and this entry is called the leading entry. Thus, for the first row of M, the index 
is 2 and the leading entry is 4, while for the third row the index is 1 and the leading 
entry is 3. 

The first step in row reduction is to locate a row of smallest index, to move it to the 
top position by interchanging it with the top row if necessary, and to divide it by its 
leading entry. For the given matrix M, we interchange the first and second rows to 
obtain 


/2 4 0 2\ 


In 4 _ a 8 


\ o A 4_ (\ i 


\30 0—9/ 


and we then divide the top row by its leading en 

trv 9 to obtain 


/I 2 0 1 \ 


o 4 _4 8 ] 



L_ 1 O _ £. O / 

r 


The second step is to clear the column under the leading entry of the top row. This is 


xjy MiULIatllilg IIIC IdlC lllUllipiC U1 111C lUp 1UW UlllC/1 iuw 

m turn. In our example, we subtract 3 times the top row Irom the Third row, 


obtaining 



The matrix now has a leading entry one in the top row, and all other rows which 
are not zero have an index greater than the index of the top row. We next move a row 
of next smallest index to the second position and divide by its leading entry. In the 
example, the second row already has next smallest index, and we divide it by its 
leading entry, 4, to obtain 



We now cl e ar the column corresponding to th e l e ading entry in the second row by 



subtracting a suitable multiple of the second row from all other rows. In the example, 
we subtract twice the second row from the f ir s t ar>H snhstract —6 times the second 
row from the third, obtaining 



(\ 0-2 —3^ 



0 1 — 1 2 — 

V0 0 0 07 

|. (10.10) 


Now the first and second columns both contain just a single 1, which is the leading 
entry of a row. 

In the general case, we now again interchange rows, if necessary, to move a row of 
smallest leading entry to the third position, divide this row by its leading entry, and 
subtract multiples of it from all other rows to clear the column of the leading entry. 
Eventually there are no more non-zero rows, and we have a matrix in row-reduced 
form. In the example this has already happened. Note the following features of a 
row-reduced matrix such as given in (10.10). 

(a) All zero rows, if any, are at the bottom. 

(b) The non-zero rows are arranged in order of increasing index. 

(c) Every column containing the leading entry of a non-zero row has a one as 

_ its leading entry and zeros elsewhere. _ 

Each operation in the row-reduction process can be achieved by left multiplic¬ 
ati on by an invertible n x n matrix. For examp le, multiplying on the left by t he 
matrix- 


TT 1 0\ 


S l = 1 

10 0 



{o 0 \) 



interchanges the first and second rows: 


T 



■4 



TT 



1 0 0 


0 


.0 0 1 


0 


6 —9. 


0 


Multiplying on the left by the matrix 



divides the first row by 2: 


2 0 0\/2 4 
0 1 0 10 4 
.0 0 1/\3 0 




Multiplying on the left by the matrix 



/ 1 o o\ 


o _ 1 

( Au u ) 

f_n_i_ a 

1 

^3 = j 

U 1 u 

1-3 0 \ 

r 




subtracts three times the first row from the third: 


V—3 0 1 / \3 0 6 -9/ \0 -6 

Thus we can write the final row-reduced matrix B as 


— &k°k- 1 ‘ ^3^2^!^ 


or as 


B = SM 


where S is an invertible nxn matrix. Notice that, since S is invertible, dim 
im B = dim im M. 

The image and kernel of the row-reduced matrix B are easy to determine. Clearly 
the image is the r-dimensional subspace corresponding to the r non-zero rows of B, 
spanned by the columns 

(¥\ 

0 , o ,... 


which contain the leading entries of all the non-zero rows . By Th e rank-n ullity 
-theorem, the kernelofJlhas-dimensionm — ^equaltothenumber-olcolumns-that— 
do not contain leading entries. To find a basis for ker B, we consider vectors which 
have a 1 in one of the m — r positions corresponding to the columns with non-leading 


columns. 

For example, with 


B = \ 0 1-1 2 

\0 0 0 01 / 

/i\ /o\ 

a basis for im B is clearly I 0 and 111. The columns without leading entries are the 

w V 

third and fourth, so we search for basis vectors of ker B which have the form 


Ul 1 


and u 9 = 


JT\ 
y 2 
o ' 

, 1 / 


netting Bui = 0, we In 




so 



I 


3 — 0, y 2 + 2 — 0 


so 

3 

-2 

« 2 = 0 • 

\ >/ 

Of course, we were interested in the kernel and image of the original matrix A, not 
of the row-reduced matrix B. However, B = SA , where S is invertible , so 


A = S~ l B. 


Clearly any vector in the kernel of B is also in the kernel of A, so by finding the kernel 
of B we have also found the kernel o f A. To find th e image of A we must invert S and 


let ST 1 
dimi m B — d im im A. 


B = SA, ker B = ker A, and 


Suppose now that we wish to solve an equation of the form 

Ay = w. 

We apply the operations of row reduction both to the matrix A and to the vector w, 
obtaining 

S/4v = Sw or B\ — u 

where B is row-re d uce d and u = Sw. This equation i s o f a form like 


*3 

W 

and it can be solved by inspection, as follows. 

(1) If any component of u corresponding to a zero row of B is different from 
zero, the equation has no solution. 

(2) If the components of u corresponding to the zero rows of B are all zero, then 

j u A 

v = U2 
0 0 

_ : _ m _ 

is one solution to the e quation. 

(3) The general solution to the equation is of the form v n + v, where v6ker A. 



0 - 2 -3\ 

0 1-1 2 
\0 0 0 0 / 




In practice, before applying the row-reduction procedure, it is convenient to 
comb ine the matrix A and the vector w into a single array so that row reduction can 
be appli e d to both at once. Her e is an example of the compl e t e proc e ss. We wish to 
solve 

A\ = w, 

where 



A=\ 


3 

1 


2 

■2 


and w = 


We combine A and w into the array 


2 4 2 2 
13 2 0 

3 1-28 



and apply row reduction, obtaining successively 



One solution 




tr 


U4 


To find the general solution, we must construct a basis for the kernel of A. One basis 

l'\ 

vector, with one in the third position and zero in the fourth, is 


with zero in the third position and one in the fourth, is 
solution to A\ = w is 


/-3\ 

1 
0 

l V 


\®/ 


The other, 


So the general 



1 A 


A 


(-A 


v = 

— 1 

4- 2 , 

- 1 

+ 3 

1 





1 

i / ^2 

0 



ts i 


It 

f 

t H 





where l i and A 2 are arbitrary real numbers. Note the characteristic form of the 
solution in relation to the columns of the row reduced matrix which do not contain 



wni 

IfM 




ors ior 


ana iourtn posiuo 

ker A each have zeros in all but one of these positions. There are many ways to write 
1 solution to the equation, but this is the simplest. 


if j ■ 111^* ■ r< * im« ■ ■ » ■ ■ i « mm m imr i ■ ■ 11i» |» m ■ 


efficient method of matrix inversion. The transformation S that row-reduces A to 
the identity matrix is just the matrix A ~ 1 , and it can be calculated step by step if each 
individual row-reduction operation is applied to a matrix that begins as the identity 

f 1 2 IN 

matrix. Suppose, for example, that -4=1 2 3 3 1. We begin with A and the 

\-i -i 0/ 

identity matrix, 


( 1 

2 

1 

1 

0 

0\ 

2 

3 

3 

0 

1 

0 

V-1 

-1 

0 

0 

0 

1/ 


and apply row-reduction operations to both. Substract twice row 1 from row 2; add 
row 1 to row 3: 


)ivide row 2 b y — 


1 2 1 

0 -1 _L 

-0-1—1 


1 0 (T 

- 2 1 0 
1 0 b 


1-1 2-1 




rom ro 




ract row 


Divide row 3 by 2: 


1 0 
0 1 
0 0 


1 0 
0 1 
0 0 


2 0 

-1 0 

1 1 


2 O' 

1 0 

1 i 

2 2 


Subtract 3 times row 3 from row 1; add row 3 to row 2: 

(\ 0 o -1 +i -!\ 
o r~o~ f -1 rr 
\Q 0 1 —I I \ 








Cow rei uctiof 


Let us now return to the case of a general rectangular matrix. Instead of 
performing row operations, we could perform column operations: just the same 
operations as in row reduction, but with the word ‘row’ replaced by ‘column’. We 
would end up with a matrix 

C = MT 

where T is an invertible square matrix and C is column reduced. That is: 

(a') All zero columns of C, if any, are on the right; 

(b') The non-zero columns of C are arranged in order of increasing index (when 
the index of a non-zero column is the position of the first non-vanishing 
entry); 

and 


Notice that 


;ros elsewhere. 


C — im. 


and the r non-zero columns of C will be linearly independent. Hence they will give a 
basis of im M. Thus, to summarize: 


To find a basis of im M. anolv column reduction. The non-zero columns 



o nna a oasis oi xer m apply row reauction, i ne resulting rows o 
row re d uce d ma trix B = SM give a set of r equations for ker B which are in 
‘solved’ from - solved for the positions of the columns containing leading 
entries in terms of the remaining m — r positions. A basis can be found by 
successively choosing 1 for one of the remaining positions with the other 
remaining positions zero and solving. 


We can also perform both column and row operations. For example, suppose we 
perform column operations to the row-reduced matrix 


1 0 
0 1 
0 0 


2 3 
1 2 
0 0 


Subtracting multiples of the first column from the third and fourth yields 


0 1-12 
0 0 0 0 




and subtracting multiples of the second column from the third and four 


in general, oy periorming column operations to row-reduced matrix B, we can first 
a rrange (by switching columns) that the leading columns are exactly th e fir st r 
columns. (This step was not needed in the example above). Then, successively sub¬ 
tracting off multiples of each of the first r columns from the remaining m — r 
columns, we end up with a matrix whose only non-zero entries are r Is down the 
principal diagonal, i.e., of the form (10.9). We have thus described an effective 
algorithm for finding matrices S and T such that 

SMT has the form (10.9). 

10.9. The constant rank theorem* 

If we combine the results of the preceding section with those of section 6.3, we obtain 
some very powerful information about the behavior of differentiable maps. Let V 
and W be vector jspaces of dimensi on m and n respectively. Let 0 be some (open) 
region in V and suppose that _ 


is a differentiable map . At each point pgO we can compute the differential, d/ p , of /at 
p. The differential d/ p is a linear map of Linto W, and so we may compute its rank. Of 
course, thi s rank depends on the point p. Our pur pose is to prove the fo llowing 


nraMTTl 






that the 


4> mapping a neighborhood ofxinto !R m such that (j) has a differentiable inverse, and a 
one-to-one differentiable map ij/ mapping a neighborhood of f(x) in Winto 1R”, also with 
differentiable inverse, such that the composite map 

is a linear map with matrix (10.9). 

In short, this theorem says that, for differentiable maps of constant rank, the main 
theorem of row reduction holds: we can ‘make changes of variables’, i.e., find maps <p 
and i (/, such that 


W°f o 4> x ) 


iis section can be omitted on first reading. 


and that we are interested in/near 0. By row reduction, we can bring the linear map 
d/ 0 to the form (10.9), that is, we can find invertible linear maps R: and 

S: V-> R" such that_ 

d(R/S" 1 ) 0 = Fd/ 0 S- A 
has the form (10.9). Thus we can write 

1 

* /s ~‘= /: J 


where the matrix 


\ 3x jJ 

is the ide ntity mat rix at O.Hence it i s invertible i n some neighborhood o f 0. Consider 


That is. the first 


nent 


whi le the last m — r 


m — r 


r 

rows 


0 I m — r 
1 / rows 


columns columns 

is invertible at 0, and hence is invertible near 0. We now apply the inverse function 
theorem of section 6.3 to the map g. Review the proof there to see that it was valid for 
arbitrary finite-dimensional vector spaces. (Observe also that the implicit function 
theorem and its proof are valid, where x and y are taken as vector variables and the 




to our current problem, we can tind an inverse tor g. Let 

ft = (R/'S- 1 )°g- 1 . 








ien 


Hence 


1r + 1 9 • • • 5 


d\ = 


y r 

9r+l(y) 


' 9n{y)l 

j n defined on iR m near 0. Now 
S~ * ) ° dg ~ 1 wh ere q = 


rankd/? ir =r. 


Looking at the matrix 


li n — r 


columns columns 

the only way that this can happen is if all the partial derivatives occuring in the 
lower right-hand corner vanish identically. Thus the last n — rhs must depend only 
on the first r coordinates, y. That is, 

9r+i =gr+i(yi,“-,y r ,Q,---M etc. 

Now introduce the transformation H on IR" given by 


(z .z„,0,...,0) 


u n unx* 13 * • * ’ * • * 9 






w 

Substituting the definition h = (RfS~ 1 )°g~ x into H°h gives 

H°h = (HR)°f°(S' 1 g' 1 ). 

Defining _ 

i j/ = H°R 
and 

<P = g°S 

shows that ° f ° 4>~ 1 has the desired form: 



The most important application of this theorem is to the case where r = n, the 
dimension of the image. If d/ is continuous and has rank n at some point p, then we 
claim that it has rank n at all points sufficiently close to p. Indeed, by row reduction, 
we can find R and S such that Rdf p S~ 1 has the form (10.9) with r = n. Now, for q close 
to p, the upper left-hand block of Rd/ q S“ 1 will be close to the identity matrix. Hence 
the dimension of the image of d/ q is at least n. Since the dimension of this image 
cannot exceed dim W= n, we conclude that rank d/ = n for all q in some 
neighborhood of p. From the constant rank theorem, we conclude: 

The solution set theorem. Suppose that f:Q->Wisa continuously differentiable map 
and that d/ p is surjective a t p, that is, that rank d/ = dim W. Then we can find 
differentiable maps (j> mapping a neighborhood of p. into M” and i/f mapping a 






neighborhood o//(p) into IR", both with continuously differentiable inverses such that 
(f>( p) — 0 in U m and = OmK” and _ 


-m 



1 


x r 


0 


\o / 

In particular, a point x is t/ie solution to the equation 

/(x)=/(p) x near p 

if and only if 

ll°\\ 


x = (f) 


- i 


0 

}m+ 1 




with y m+ + all near 0 . 


will make the statement 


theo rem more succinct. Let H w _„ denote the subspace of U m determined by the 
equations _ 


=0;...,x„^Q7 


(Here n is assumed to be <m.) A subset M of an m-dimensional vector space V is 
called a sub manifold of codimensi on m — n if it has theT ollowing pr operty: about each 


point xeM we can find a neighborhood 0 in V and a differentiable map (Jr.O ->U m 

in a one-to-one fashion into a neighborhood, U, of 


0, and <fi 1 is differentiable, and such that 


(p(MnO) = U 









In other words, the condition on M says that, near each of the points, we can find a 
ooth distortio 




rfi !7TFi ncTfTTi iciTil iTTnTTTTTHil irai iTTnTrnTTflTTiTJ ITnJ 



mamtoid oi codimension i in int, Decause we can introduce 


</>( x, vl == 

( r 2 — 1 ) 


'r \ 5 y ) 

^arctan (y/x) J 



at all points with x # 0 (followed by an appropriate shift in the vertical direction 
to center the image at the origin). At the points x = 0, we can use arctan (x/y). The 
perimeter of a square is not a submanifold, because there is no smooth way of 
straightening out the corners. 

Let /: 0 -» W be a continuously differentiable map. A point peO is called a regular 
point of / if d/ is surjective; in other words, if rank d/ p = dim W. A point which 
is not a regular point is called a critical point. If W= IR, then p is a critical point 
if d/ p = 0. This agrees with our earlier notation. 

A point qe W is called a regular value if all points p in f~ *(q) are regular points. 
Then we can formulate our theorem as 



Here are some examples. 
(a) l ake n = 1 and/JT 


= W given By' /1 


= xj + —hx^. Then 


7 ^ 0 if p # 0. Thus any non-zero value of / is a regular valu e. Fo r c > 0, / ‘(c) 
is the sphere of radius ^Jc. Thus spheres are submanifolds. 

(b) Let M(k) denote the vector space of all k x k matrices, so m = dim V— k 2 . 


f: V-> W f(A) = AA t . 


Then 


d f A (B) = BA t + AB T 

as you can easily check. We claim that the identity matrix, /, is a regular value 
of /. We must show that d f A is surjective if AA T = I. That is, we must be able to 
solve 

BA T + AB t = C 

for B given any symmetric matrix C. Indeed, take B = jCA. Then 

BA t + AB T = iCAA T + \AA t C t 

= j C + jC T since A A T = A T A = I 


■T 




Thus the set of all orthogonal matrices - those satisfying AA J = I-is a sub^ 
of the space of all matrices. 


10.10. The 


When studying transformations from one affine plane to another, we made use of 
the concept of pullback of a function. Recall that a t r ansformation </> f r om an affine 
plane A to another affine plane B gave rise, in a natural way, to a linear trans¬ 
formation from the functions on B to the functions on A, as depicted in figure 10.5. 


<t>* 



A 

<t>*f 

<t> 

B 



<p(P) 


P 










Figure 10.5 



A defined by 


This co ncept of pullback can be exte nded immed ia tely to th e case of a trans form¬ 
ation from any vector space V to any other vector space W. We will take special 
interest in the pullback of linear functions on W (i.e., of elements of the dual space 
W*) which arises as a consequence of a linear transformation A from V to W. In 
this case the pullback transformation from W* to V* is called the adjoint of A. 
We denote the adjoint by A*, and define its action on an element peW* by 

(X*/?)[v] =/?[/! v]. 

The proof that A*, thus defined, is linear in p is the same as the proof that 
pullback in general is linear: if P 1 and p 2 are elements of W*, then 

(A^CiP 1 +c 2 p 2 ))[v]=(c i p 1 +c 2 p 2 )[Av] 

= c 1 p 1 [Av] + c 2 p 2 [Av] 

= c 1 A*p l rvi + c?A*0 2 rvi 





and A* is linear. Notice that the linearity of p and A imply that A* p is a linear 
function of y, so that A*fi does in deed lie in V*. 

It is crucial to obse r ve that the adjoint A* acts ‘in the opposite direction’ to A. 
Note carefully: if A transforms a vector veT into a vector A\eW, the adjoint A* 
transforms a vector fie W* into a vector A*fie V*. This can be summarized in the 
di a gram 

- A* - 

V* <— w* 


v-^w 

As an example of the adjoint transformation, let V be the three-dimensional space 
of even polynomials of degree < 4, and let W be the two-dimensional space of 
odd polynomials of degree ^ 3. Then the operation of differentiation defines a 
linear transformation D from V to W: 

D 

A typical element of IT* is 

p:g\-+g(\). 

F or example, p( t + 21 3 ) = 1 + 2 = 3. To calculate D*p we use the definition^ 

D* P [ Q=P[Dn 

In the case at hand, Df=/'(£) and /?[Df] = /'(1). We conclude that D*p is the 
linear function on V (element of V*): _ 

D * /I:/(f)->/'(!)• 





*1 

or. 

g{t) dt. 

« 

0 

Now 


r i 

D*a[f] = «[Df] = 

o, 

1 

II 

-T3 

'ZI' 


o 


so D*a is the linear function 

D*a:/(t)^/(l)-/(0). 

When we introduce bases for V, W, V*, and W*, the description of the adjoint 
becomes particularly simple. Suppose that V is m-dimensional, with basis 
{v l5 v 2 ,..., v m }, and that we have introduced a dual basis {ct 1 , <x 2 ,..., a" 1 } in the dual 
space V*. Similarly, let {w l5 w 2 ,. ..,w„} and {p 1 ,P 2 ,.p n ) be dual bases in IT and 
IT* respectively. If A is a linear transformation from V to IT, then 

n 

Av i= Z g ji w j > 

f=l . 

where the quantities {a u , a 2i , •. ■ ,a ni } form the ith column of the matrix which 






represents A. Now we can calculate how A* acts on a basis element ol W’ 


(rm[v,]=r[.4v,]=/; ' ■ 


E a ji w j 


.j=l 


= E 


;= i 


But since the basis is dual to (w 1 .w„|, we have 


i-l 1 J= k > 

) a rrr 


/ ? fc r w 

r L ” 


so that 


0 j^k 


(A*fi k )M=a k , 


Thus we may express A*fi k in terms of the basis elements {a 1 ,...,a m } of V* as 

A*fi k = E a ki al - 


i= i 


The quantities { a kl } = {a nl , a k2 ,..., a km } form the fcth column of the matrix which 
represents A*, but they are also the kth row of the matrix which represents A. 
This means that the matrix which represents A* is just the transpose of the matrix 
which represents A. Thus, for example, if Kis three-dimensional, W two-dimensional 
and A: V-*- W is represented by 




a 


-a 




-a 


13 


A = 


11 


\a 


21 @22 


a 


23, 


the adjoint t r ansfo r mation A*: W* -> V* is r epresented by 

- /@ 11- a 21^- 

A* —I a 12 a 2 2 - 

\@13 a 23 J 

An easy way to se e that th e matrices repres e nting 
one another is to recall that elements of the dual space may be represented as row 
vectors. In the present example, an element of IT* may be thought of as a two- 


component row vector: 


' ~ (^"15 


while an element of V is a three-component column vector: 


Now (A*fi)[\]=p[A\] is written as 

/ 

a 



(/Li,/L 2 ) 


11 


a 


12 


@21 a 22 @23 



It is most natural to think of the matrix A as acting first on the column vector to 
its right to yield A\, which is then acted upon by fi. Alternatively, though, we can 


think of it as acting first on the row vector to its left. 

(@11@T2@T3\ 


(^i, A 2 ) 


\@ 2 : 


— (t 1 1 * 1*2 » ^ 3 ) 


a 


22 


@23 J 




where (g,, vl^uA represents the vector {A*p)eV*, which then acts on veK Thus, 
if we reverse th e u s ual conventions of matrix multiplication, letting a matrix act 
on a row v e ctor to its left, the same matrix represents both A and A*. If we want 
to represent A* more conventionally, by a matrix which acts on a column vector 
to jts right, we must write 



pw\ 


f a 11 a 2l\ 




Hi 

vv 


a l2 a 22 I 

a 23 J 

lUJ 

r 


Now, of course, the matrix representing A* is the transpose of the one which 
represents A. 

We turn now to an investigation of the kernel and image of the adjoint A*. If 
peW* is in the kernel of A*, then A*p\y~\ = 0 for all \eV, so that ^[/4v] = 0. This 
means that p annihilates all vectors of the form A\. We conclude that the kernel 
of A* annihilates the image of A. 

Now suppose that aeV* is in the image of A*, so that 

a = A*p for some PeW*. 


Suppose that v is an element of ker A. Then 


or [v] = A*p[y ] = P[ A\~ ] = 0. 


We Conclude that the image of A* annihilates the kernel of AT 


Put ting th ese results together with t he general results about dual spaces and 
quotient spaces proved at the end of section 10. 5, we can construct two diagrams 
which summarize our general picture of vector spaces and linear transformations. 
Looking at s ubsp ace s of V and V*, we have 



J/*/i m A * j V* < irn A * 






ker A V-> TTker A. 



reflects the fact that the quotient spac e V*/ im A m ay be identified^ 


with the dual of ker ,4, while the image of A* is dual to L/kerA 
Looking at subspaces of W and W*, we have the diagram 


W*/kei A* <- W* <— ker A* 
im A-> W-> W/imA. 

Here IT*/ker A* may be identified with the dual of im A, while keryi* is dual to 
Wjim A. Numerous examples of these relationships will appear as we study electric 
network theory. 


Summary 

A V e ctor spaces 

You should know the axioms for a vector space and be able to apply them. 









Given a basis for a vector space, you should be able to recognize or construct 
a dual basis for its dual space. 

Given a subspace U of a vector space V, you should be able to construct and 
use a basis for the annihilalor space t/ 1 and the quotient space V/U. 

B Linear transformations 

You shou ld be able to write down the matrix that represents a linear transformation 
A: V-* W between given bases. 

You should be able to state, prove and apply the rank-nullity theorem. 

Given the matrix of a linear transformation A, you should know how to use 
row reduction to determine bases for kerA and im A and to find the general 
solution to Av = w. 

You should know the definition of the adjoint A* of a linear transformation A 
and be able to state, prove, and apply relations between the kernel and image of 
A and the kernel and image of A*. 


Exercises 


10.1. Consider the five-dimensional vector space V of polynomials f (t) of degree 


< 4. Determine whether each of the following is a subspace. If not, explain 


why. If so, find a basis. 


(a) Elements of V satisfying f(t) =/(— t). 


(b) Elements satisfying /(0) = 1. 


/(D=/(-*) • 


(d) Elements satisfying (t) dt — 0. 


10.2.(a) Find a basis for the subspace of [R 3 defined by or[v] = 0, where 


or = (2, — 3,1) 


(b) Find a basis for the subspace of R 3 defined by or[v] = 0 and /f[v] = 0, 
where or = (2, — 3,1) and p={ 2,1, — 1). 


(c) Find a basis for the annihilator space of the subspace IFelR 3 spanned by 



10.3.(a) Show that the set of functions f{t) satisfying f" + 5/' + 6 f — 0 is a vector 
space V. 

(b) Three elements of the space V* dual to this space are 

=JW( 0 ), 
or 2 =/-/m 

Find a relation among or 1 , or 2 , or 3 which shows that they are linearly 
dependent. 

(c) As a basis for V, choose 








and B 2 in terms of a 1 and a 2 above. 



ow mat 5 + i (tne set oi veeiors mat are linear comomauons oi vectors 
in S and T) is a subspace of V. 

(c) Show that dim (S + T) — dim ( S ) + dim ( T ) — dim ( S n T ). 

Hint: Start with a basis for S n T and extend it to a basis for S + T. 


r\ r\ r 

(d) Suppose V is M 4 , S is spanned by ^ and 1 ^ , and T is spanned by 2 

i0/ \ 1 1 1 


and . Construct a basis for S n T, for S + T, and for the annihilator 


space of (S + T). 

).5. Let W be the subspace of R 3 spanned by I 



m 

/l\ 


(a) Show that e r =| 

-L- 

i ^ 

|and e 2 = 1 

I form a basis for the-quotient space - 


n\(°\ °\ 

l 3 /W , Express ! Oil 1 l and 1 0 las linear combinations of the 


(b) Show that a = (2, — 1,0) and p = (4,0,-1), both elements of the dual 
of K 3 , are a basis for the annihilator space W 1 . In terms of a and P, 
construct a basis {e l ,e 2 } for W 1 which is dual to the basis {e l5 e 2 } for 
R 2 /W, 

10.6.(a) Let W be a subspace of a vector space V; let U be a subspace of W. Prove 
that W/U is a subspace of V/U, and show that there is a natural 
V/U 

identification of-- with VIW. 

W/U 

(b) For the case where V is R 3 , W is the plane x + y + z = 0, and U is the line 

f'\ 

spanned by I— 2 I, construct explicit bases for these spaces. 

V V 

(c) Figure out what is happening in the dual space, that is, construct 

subspaces of V* which are the dual spaces or annihilator spaces for the 
various spaces in (a). Do this first in general, then for the explicit case 
described in (b). _ 


A 4- 









< <. n< inear trans ormns 


Two elements of the dual space V* a rp 


fi. 


/(0-/(l), 

2 ). 


Exp r ess the basis e l e m e nts {a 1 ,a 2 } which a r e dual to {vn,v 2 } as linear 


combinations of j? 1 and fi 2 . Before working exercises 10.8-10.12 reread 
section 4.2 on the Gram-Schmidt process. 

10.8.(a) Find the dimension of the space of trigonometric polynomials spanned by 
(l,sin 2 x, cos 2 x, sin 4 x, sin 2 x cos 2 x, cos 4 x}. 

(b) Define a scalar product on this space by 

(f,g) = 


n 


f{x)g{x) dx. 


With respect to this scalar product, construct an orthonormal basis. 

10.9(a) Consider the vector space of odd polynomials of degree ^ 3 with basis 
{t,t 3 } and scalar product Jo/(f)g(t)dt. Construct an orthogonal basis (you 
need not normalize). 

(b) Given the linear operator/: V-> V defined by/ (p{t)) — tp'(t), construct the 
matrix A which represents / with respect to the {t, t 3 } basis and the matrix 
A which r e presents / w ith resp ect to the o rthogonal ba sis whi ch you 


construct e d. 

(c) Three dements of V*, the 

* 2 0] =P l ( 0), <x 3 [p] = jotp(t)dt. 


follows: 


Find a relationship among or 1 ,ar 2 , a 3 which shows explicitly that they are 


dependent 


linear 


10.10. Let V be the two-dimensional vector 

combinations of / t — 1 and f 2 — cos 2 1. Define a scalar p roduct on this 


spac e by (/, g) = (2/7i)$ 2 f(t) g(t)dt. 




Note: 


2-CaH- 


1 2 C nl2 


cos 2 tdt =-; 


2 n Jo 


cos tdt = 3/8 


K 


JO 


(b) Three elements of the dual space V* are: 


al :/-»/fc/2), a 2 :/- 


r*i 2 


f(t )cos tdt, a 3 :/ 


‘7t/2 


/(t)sin tdt. 


Show explicitly that these are linearly dependent. 

10.11. Consider the three-dimensional vector space V whose elements are the 
polynomials of degree < 2 multiplied by e -2f . A basis for this space is 


v x = e 2t , v 2 = te 2t , v 3 = t 2 e 2t . 


(a) With respect to this basis, write down the matrix D which represents 
the operation of differentiation; i.e., 

(b) Const r uct th e matrix D 2 + 4D + 4. (Th e r e are a lot of zeros in this 
matrix!) 

(c) The general solution to the equation x + 4x + 4x = e _2f lies in the 
vector space V. bind it by using the matrix that you constructed in (b). 




10.12. Consider the vector space V of solutions to the differential equation 

Y 4- tl 

r 4- 7. v = 0 • 


Define a scalar product on this space by 


'00 

(f,g) = 

f(r)g(r)dt 

V 

X) 

(Remember that f® e at = 1/a.) 


(a) Take f t = e 2t as the first basis vector for V. Construct a second vector 
f 2 that is orthogonal to f t . 

(b) With respect to the basis {f x ,f 2 }, construct the matrix D that 
represents the operation of differentiation with respect to t. Verify that 
D 2 + 3D + 21 = 0. 

Three elements of the dual space V* are the following: 

*i[f] = f(0), 
ar 2 [f] = f(0). 


* 3 [f] = 




(c) State on what grounds you know that there exist numbers X x , 1 2 and ^3 
(not all zero) such that 

- l 1 ar 1 + X 2 a 2 + A 3 a 3 — 0. - 


Then determine X X ,A 2 and A 3 . 

(d) In terms of a x and ar 2 , construct a basis {fiufi i) for F* that is dual to 


the basis { f t , f 2 } that y ou constructed in part (a). 


10.13. Let Kbethefour-dimensional vector space of polynomials of degree < 3, 


withrbasis 


spacer 
T(t + a) 



. Let D be the differentiation operator on this 
let T a be the translation operator: T a f(t) = 


(a) Construct the matrices which represent D and T a relative to the given 


basis, and show that 

T n = l + 


lr ) 2 a 2 + iDV. 


2 xv u T 6 J 

(b ) Prove, that if V is the space of polynomials of degree ^ n. 


Da 


T a = c 

(Hint: Think of Taylor’s theorem, and you need not construct any 
matrices.) 

10.14. Let V be the space of one-forms on the plane for which the coefficients of 
dx and dy are quadratic functions. A basis for V is x 2 dx, xydx, y 2 dx, x 2 dy, 
xydy, y 2 dy. Any curve T in the plane defines an element a r of the dual 
space V* by the rule 


ar r [o>] = 


CD. 


(a) Invent a non-trivial curve T for which a r is the zero element of V*. 

(b) Find a basis for the subspace of V which is annihilated by a r , where T 
is any closed curve. 

(c) Let r i? r 2 and T 3 be any three closed curves in the plane. Prove that 
ar ri , af r „ and a T% are linearly dependent. (Hint: Use Green’s theorem to 


convert the integrals to double integrals.) 




near 


(d) Find curves r x , T 2 ,..., r fi (straight line segments will do the job) so 

that the elements a ri ,« r . a r form a basis for V* which is dual to 

the basis listed above. For exarnnle 


xydx = l while xydx = 0 j=l,3,4,5,6. 


10.15. Consider the linear transformation / from R 4 to 1R 3 whose matrix is 

( 12 0 l\ 

10 2-3). 

0 i -1 y 

(a) Find a basis for the kernel (null space) of f, and construct the general 
solution to the equation 


m= 




iin how you are sure that yo ur ba sis vect ors ar 


10.16. 


transformation / from R 4 to IR 3 whose matrixls 

— /-2 —4 - 2—iV- 

i 3 —2 e h 
\3 1 -2 8 / 

the kernel of f, and construct the general solution to 


(b) Write down a basis for the image of/. 

(c) Two elements of the quotient space H are 

h x —I 1 1 and h 2 =1 0 I. 

Show explicitly that h x and h 2 are linearly dependent. (Hint: Look back 
at part (a).) 

10.17. Let A be the matrix of a linear transformation f:U 4 -+R 4 given by 

1 0-1 l\ 

0 2 4 -8 

A ~ 2 1-4 6 

U1 -3 5 , 



e solutions w l5 ...,w k to 








(c) Do the vectors wv form a basis for P* 4 ? What does 
your answer imply about the transformation/? 

(d) Find a basis for im A and express this basis in terms of the w ; and the v,-. 
How dr.es this answer relate to your answer to (c)? Hint: Something 

ctrunop ic rrninO HtlH 


/ 2 4 2 2^ 


A = 

12 0 2 

3 6 5 1 

\0 0 3 - 31 



(a) Using row reduction, construct a basis for the kernel of A and a basis 
for the image of A, and construct the general solution to the equation 

/ *\ 


VkerA 


(b) Construct basis vectors U! and u 2 for the quotient space U — 

. . IX . 


Express the vector 


in terms of u, and u,. 


10.19. Let W be the sub s pace of IR 4 spanned by the vectors 





/ \ 


/ \ 



1 l \ 


h \ 


m 



l 1 


3 


i 



j 

, w 2 = 

5 

, w 3 = 

3 



l iJ 




\ 5 I 



r 1 / 


Yl 


vf 



The scalar product in M is the ordinary Euclidean one. 
fa) Construct an orthonormal basis for W. 


1 ^ 


(b) Write the vector v = ^ 

as the sum of a vector in W and a vector 


orthogonal to W. 

(c) Let/: R 4 -*■ R 3 be defined by 

/V i,v)\ 

/(▼) = ( (w 2 ,v) l 

\(w 3 ,v)/ 

Write down the matrix representing /. By row-reduction, construct 
the general solution to the equation 



annihilator space U L is spanned by the row vectors 

« 1 = ( 1 , 2 , 0 , - 1 , 2 ), 

« 2 =(2, 4, 3,4, 4), 
ar 3 = (0,0, 1,2,1), 

_ ft 4 = (3, 6, 5, 7, 8). _ 

Th e n construct the general solution to the simultan e ous linear equations 
« 1 [v] = 5; a 2 [v] = 16; a 3 [v] = 3; a 4 [v]=27. 

10.21. Use row reduction to calculate the inverse of the matrix 



10.22. Let V be the space of functions / on R 2 with the property that 
f(Ax, Ay) = A 3 f (x, y). A basis for V is 

{v 1= x 3 , v 2 = x 2 y, v 3 = xy 2 , v 4 = y 3 }. 

Let W be the space of one-forms on the plane which are quadratic 
functions of x and y, with basis 

{w x = x 2 dx, w 2 = xydx, w 3 = y 2 dx, w 4 = x 2 dy, w 5 = xydy, w 6 = y 2 dy}. 
The operator d is then a linear transformation from V to W. _ 



Figure 10.6 

(c) Two elements of the dual space W* are a 1 , which assigns to any weW 
the value of the integral j a i co, where a 1 is the unit square 0 ^ x ^ 1, 
0<y < 1, traversed counterclockwise, and a 2 , which uses instead the 
unit square O^x^l, —l^y^O. Construct the row vectors which 
represent a 1 and a 2 , and construct linear combinations of a 1 and <x 2 
which are the dual basis for your basis of G. 

(d) Let U denote the space of two-forms on the plane which depend 
linearly on x and y, with basis 

(u x = xdx a dy, u 2 = ydx a dy}. 

Construct the matrix which represents the oper a tor d from W to U. 
What is the kernel of this operator? 


10.23. Let A be a linear transformation from V to W, A* the adjoint transform- 
ation from W* to V*. Suppose that we have not chosen dual bases in V and 

_ V*. Instead, we have a basis {v 1 ,v 2 ,...,v m } for V, a basis {a 1 ,a 2 ,.. .,a m } 

for V*. with a i [y j '] = 5,-j. The numbers S u form an m x m matrix S. 
Similarly, we have bases {w t , w 9 ,..., w„} and (Z? 1 , 0 2 <■ ■., fi n } in W and W * 
respectively, with = T kl . The numbers TL form an nxn matrix T. 


what is the matrix of A* relative to bases {ft 1 } and {a‘}? Express your 
answer in terms of A T , the transpose of the matrix A, and the matrices S 
and T. 

10.24. For a vector space V with a scalar product (v! v 2 ), the adjoint A* of a linear 
transformation A:V-*V is another linear transformtion A*:V->V de¬ 
fined by 

(X*v 1 ,v 2 ) = (v 1 ,v4v 2 )- 

(a) Show that this definition follows from the definition of A* as a 
transformation from V* to V*, combined with the usual identification 
of V* with V which arises from the scalar product. 

(b) Show that, relative to an orthonormal basis {e l5 ...,e„}, the matrix 
representing A* is the transpose of the matrix representing A. 

(c) Let n denote the linear operation of orthogonal projection from V 
onto a subspace W; i.e., for any \eV, W lies in W, and x — ny is 
orthogonal to ^vT^Note: 7r is a transformation from V to V, with 
im n = W.) Show that n* = n. 

(d) Let V be the space of p olynom ials of de gree ^ 2, with scalar produ ct 
_ (f , g ) = f p/ ( r)g(r) d t. C hoose a basis Yt — 1, v 2 = t , v 3 = t 2 (w hich is no t 

orthonormal). Let A be the linear transformation defined by 

+w~ 


Consider the linear transformation from a lour-dimensional veer 
V to a three-dimensional vector space W, which is representec 
matrix 


A=\2 1 2 2. 

\2 3 -2 10/ 

(a) Let M denote the image of A. Show that the vectors =1 2 land 
m 9 = 1 form a basis for M. 


(b) Let H — W/M. Sho w tha t a basis for this quotient space consists of the 

A 

single element h x = I 0 j, which is the equivalence class containing all 
\ 0 / - 




0 \ ,/■ 






— - hi I i.e., find an element m of M such that! 1 


-*-^X 




) 


A 


Th e n e xpr e ss 


0 


W 


in t e rms of h t . 


/TV 

-4 

0 

\ 1/ 

/ 


(c) Let N denote the kernel of A. One element of N is n t = 


Find a 


second vector n 2 such that n x and n 2 form a basis for N. 


row reduction to A, then look for a vector of the form 


Hint: Apply 


\ 

/ «\\ 

b 

VII 


(d) Let G denote the quotient space VIN. Define basis vectors in this space 

-m- 

(T 


bygi = 


o 


m 

e 


A 5 


t€2 


I . Then show that 


l °\ 

0 


1 


— 2gj—2g 2 and express 


W 


in terms of g x and g 2 . 


0 

52 

(e) Construct the non-singular 2x2 matrix C which represents T as a 


transformation from G (basis g 1; g 2 ) to M (basis m 1 .m 2 ). 


(f) Express| 1 ] in terms of m t and m 2 . (Hint: Apply the same operations 

Jy 

that you used to row-reduce the matrix.) Then apply C \ and thereby 

( 2 \ 

solve the equation Av =[ 1 1, obtaining the answer in the form v = 

l a 

1 + n, where n is an arbitrary element of N. 


flgi + bg 2 = 


\ k 


10.26. Let f* denote the adjoint of the transformation / in the preceding 
problem. The adjoint/* is a transformation from the dual space W* to the 


dual spac e V*, d e fin e d by /*/7(v) = /?(/v), wh e r e ft and v ar e arbitrary 
elements of W* and V. With respect to the bases which are dual to the 






original bases in V and W, it is represented by the transpose o 



_? l\ 





A T = 

1 1 J 



0 2 



nr/ 



A T in terms of gf and gf. 

(b) Show that the kernel of f* is dual to the space H. Let y' be dual to 
h^i.e., -y'(h!) = 1. Let p l be the element of W* which picks out the first 

f a \ 

component of a vector; i.e., wf b 1 —a. Express p l in terms of y'. Do 

the same for p 2 and p 3 , which pick out the second and third 
components respectively. 

(c) Find vectors a 1 and a 2 in W* such that f*a 1 = p l ,f*a 2 = p. Show 
that or 1 ,* 2 , y' form a basis for W*. 
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Chapter 11 is devoted to proving the central facts about 
determinants of n x n matrices. The subject is developed 
axiomatically, and the basic computational algorithms are 
presented. 


Introduction 


In this chapter we discuss properties of the determinants of n x n matrices. Let A 
be an n x n matrix. We will let A A„ de note the columns of A. Thus, if / is 
the n x n identity matrix, - 


_ L 

o 





rn 


T 

o 1 

T 

o 


1 1 “ 

. 

5 i 2 ~ 

V7 

1 . 

. . . CtL;- 




Vo/ 



For any matrix A, then 

d-i AIi ,..., A n AI n , 

in other words, A t is just the image of I h the ith element of the standard 
basis, under A. 

We expect to be able to define Det A as the oriented volume of the parallelepiped 
spanned by A u ...,A n . Our experiences in Chapters 1 and 9 suggest that this 
oriented volume may be multi-linear - that is, linear in A t , when A 2 ,...,A n are 
held fixed, linear in A 2 when A u A 3 ,..., A n are held fixed, and so on. Also (due to 
the orientation), we expect that Det A should be antisymmetric in the columns; 
that is, interchanging the columns of a matrix changes the sign of its determinant. 
We must define the determinant, and prove that it has the requisite properties. In 
fact, we shall follow the classical treatment of Artin and characterize the determinant 
axiomatically. That is, we shall write down a simple list of properties we expect 





the determinant function to have, and shall show that these properties uniquely 
characterize the determinant Tn other words, there is onlv at most one such function. 





inmons 


exist a tunction satisfying the axioms, oy 
also satisfy the axioms, we will be able to conclude that all these definitions must 
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evaluated on a matrix A we write D(A ) or D(A 1 ,..., A„) when we want to emphasize 
that it is a function of the n column vectors A t ,..., A„. If we keep all the columns 
but the kth constant, we obtain a function of a single column. We shall write this 
function as D k . (It is understood that all the other columns are held fixed with given 
values.) For example, we shall write 


for Hi 


(Strictly speaking, we should specify the vectors A x = 


11.1. Axioms lor determinants 

A function D of matrices is called a determinant if it satisfies the following 
conditions: 

Each of the functions D k is linear: 

D k (A k + A' t ) = D k (Ad + D t (Ai), 

D k (cA k ) = cD k (A k ). (11-1) 

In other words, D is linear in each column when all the other columns are 
kept fixed. 

If two adjacent columns of a matrix A are equal, then D(A) = 0. (11-2) 


D(I)= 1- 

We will now draw some consequences from (11.1) and (11.2)-assui 
mis function D exists satisfying (11.11 and f 11.21. 


(11.3) 





Adding any multiple of one column to an adjacent column does not change the value 
QilL 


Proof 

- D(A i, . .., Ah, cA k 4~ A k + 1 ,. •., A„) - 

• • • 5 A k , A k +1 ? ■ • ■ ? A n ) -)- cD(A j,..., A k , A k ,..., Afj 
by (11.1). But D(A 1 ,...,A k ,A k ,...,A n ) = 0 by (1.2). So 

D(^4i,..., cA k + A k+1 , ..., A„ ) = D(A l , ..., A„). (11.4) 


For example, 




4- 4-1 7^ 

5- 4-2 8 

6 - 4-3 9 ]) 

0 7 
-3 8 
-6 9 



Now add the /cth column to the (k 4- l)st, then subtract the resulting (k 4- l)st 
column fr o m t he /cth, then add t he /cth to the ( k +_ l)st agai n - so 


1) — • • • > A k , A k + A k+1 ■> A k + 2 , - , 

D(A i ,..., A k (A k + A k + j), A k -4 A k +1,..., A n ) 

= U(Ai , ..., — A k + i, A k 4- + 1? ..., >1„) 

— ~ • • •; — A k +1, ^4fc 4- A k +: ^ + i, • • ■, A„) — 

^(-d 1; . . . , A k + 1 , A^ k , A k + 2 ) • • • ? A n ) 

—= —DiA^..., A k + 1 ,A k , A k + 1 ,..., A „) by (114). (H-5) 

Thus 

Interchanging two adjacent columns changes the sign of D{A). 

Now this implies that 

If any two columns of A are equal, then D(A) = 0. (11.6) 


Indeed, if two columns are equal, we can keep interchanging adjacent columns until 
the two columns are adjacent then apply (11.5) to conclude (11.6). We can now apply 
the argument proving (11.4) to conclude: 

Adding any multiple of one column to another does not change the value of 
D(A). (11.6) 


Thus, continuing our example. 


D 






subtract 7 x first 
column from last 



- 12 


now add — 2 x second 
column to last 









column in terms of the others 


i - 

ting c,A i. etc, from the 


n^n i 


change D(A) (by (11.6) 


In particular, wt 


rat any n vectors of the lorm 


(with a zero in the first position) are linearly dependent. Therefore 

If all the entries of the top row of A are zero, then D(A ) = 0. (11.10) 

Suppose that at least one entry in the top row of A does not vanish. By an 
interchange of columns, if necessary, we can arrange that the first column has a 
non-zero entry in the top row. So 

D(A) = ± D(B) where b lx ^0. 

Now 

D(B) = b, l D(B') 

where the first column, B\ of B', is B\ = (l/b ll )B l . 

By subtracting off suitable multiples of the first column from each of the 
remaining columns we can arrange that all the other entries in the top row vanish, 


where 


Now consider D(B") as a function of the columns of C. Clearly it satisfies conditions 
(11.1) and (11.2). Also, if C were the (n— 1) x (n— 1) identity matrix, we could 
without changing the value of D(B") make all the entries b'^b'^, etc., in the first 
column vanish just by subtracting off multiples of the second, third, etc., columns 
from the first. For example, 


4 0 0 0 
2 10 0 
3 0 10 
\4 0 0 1, 

(\ 0 0 o' 
0 10 0 


subtracting 
= 2 x the second 
column from the first 

subtracting 3 x the 
= third column from the 
second 



j 

7l 0 0 0\ 


/ 

Iq i o n 

subtracting 4 x the 

D [ 

La _n_ 1 _n 

= fourth column from 

1 

1 U U 1 v 

\ A A A 

the third 

\ 

V 4 0 U yjj 

r 

_ 

h n o o\ 

\ 


0 1 Q A 


D 

A A 1 A 

# 


U U 1 u 

'pool/ 



In other words, 


1 

0...0 



’21 

5 nl 

c 


as a function of C, satisfies all the axioms for a determinant for (n — 1) x (n — 1) 
matrices. Therefore 

D(B") = D(C) 

where, on the right, we mean the D-function (if it exists) for (n — 1) x (n — 1) matrices. 
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for (n — 2) x (n — 2) matrices. Event ually we g et down to a 1 x 1 ‘matrix’ where 
the axioms (11.1) and (11.3) imply that_ 

D(a) = a(D(lj) = a. 
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and D(B ) = D(C) where 

/ H ^ r\\ 

/ ; 2 y \ 


C — j 13 — 4 16 J 


M8 2 14/ 



1NUW 

~7n i 9^ 

D(C) = 2D [(13 -2 16 ) I 

\\18 1 14// 

// 1 0 °v 

= -2D -2 13 16 

\\ 1 1 14/ 

// 1 ° ° 
= — 2D [-2 27 34 

\\ 1 -6 5 

) 

)) 


= -2D 


= -121D 



= -2-21D 


5 + 34 -— 


23 -. 


-2-27(5+3T-^}: 


— 


In the next section we shall give a different proof of the uniqueness of D(A), and 
a different recipe. The uniqueness implies that these two recipes must give the 
sa me answ er. Bu t w e mu st stil l sh ow 


), (11.2) and (11.3) 

exists. 


We can, however, derive an important consequence from our current algorithm 
procedure for computing PM). Suppose the matrix A is of the form 



A = 


where L is a k x k matrix, N an (n — k) x (n — k) matrix and M a matrix with k 
rows and n — k columns. In other words, suppose that the first k columns of A 
all have zeros in their last n — k positions. Then the first k columns of A are linearly 
dependent or independent if and only if the columns of L are. That is, the last 
n — k zero positions in these vectors do not affect the dependence or independence 
of these columns. If these columns are linearly dependent, then 

D(A) = 0 and D(L) = 0. 

If these columns are linearly independent, then in applying our algorithm, we can 
use the first k columns in the first k steps and thus repl a ce M by the zero matrix 
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11.2. The multiplication law and other consequences of the axioms 

Let us draw some further consequences from (11.8). Let (v l5 ..., v„) be any permu¬ 
tation of (1,■■■,»). We can rearrange the columns, one at a time in the matrix 

until they are back in their original order. At each stage we apply (11.8), and 
conclude that 

D((A V1 ,...,AJ)= ± D((AA n )) (11.12) 

where the + does not depend on the particular entries of A. Applying (11.12) to 
the identity matrix, we see that 

D((I V1 ,...,IJ)=± 1 

and hence that 


D((A V1 ,...,AJ) = D((I VI ,...,IJ)D(A). 
Now let B = (b {j ) be a second n x n matrix and let 

C = AB. 
by 


(11.13) 


C k — frifc^i + b 2k A 2 + h b nk A„‘ 


No w in computing D(C) we may first apply (11.1) to the first column of C getti ng 
a sum; then to each summand we apply ( 11 . 1 ) to the second column and so on. 
“Far _ example, in tho case - 


C i ^ 11^1 ”b b 2 iA 2 A~ b 3l A 3 , 


C 2 — 2-A j + b 22 A 2 ~b b 32 A 3, 


^3 ^13-^1 "b ^23-^2 “b b 33 A 3 


SO 


D{C i, C 2 , C 3 ) — bi i D(A 1 , C 2 , C 3 ) + b 2l D(A 2 , C 2 , C 3 ) + b 3l D(A 3 , C 2 , C 3 ) 

“ ^ij C 3 ) + b 22 D(Ai, A 2 , C 3 ) + ^ 32 D(/4 1 , A 3 ,C 3 )} 

■b b 2 y (61 2 /)(A 2 , A l5 C 3 ) + b 22 D(A 2 , A 2 , C 3 ) + b 32 D(A 2 , A 3 , C 3 )} 

*b b 3l {b 12 D(A 3 , ^15 C 3 ) "b b 22 D(A 3 , A 2 , C 3 ) + b 32 D(A 3 ,A 3 , C 3 )}. 

Before proceeding to the next step, we can eliminate all repeated columns. It is 
clear that at the end only expressions of the form D((A Vl , A V2 , A V3 )) will be left and 
these, by (11.12), are equal to ± D((A)). Thus, in general, we see that 

D(C) = D(A)£ ± b Vl ib V 22 ...b Vn „ (11.14) 

where the sum is taken over all permutations and the ± sign is the one given by 

( 11 . 12 ). 


Suppose we use (11.14) and take A = I. Then C = B and we conclude that 

D(B) = ^±b Vil ---b Vnn . ftLT5^ 


This gives an 


that it is unique - if it exists. 
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D(B) is a linear function of each row, when all other rows are held fixed. 

(11.iy 

Now in (11.14) take i to be some number 1 < i ^ n and take A to be the matrix with 

A k = h k^ij+ 1, 

A i = h + I i+ 1? 

A i+ 1 = 

For example, with n = 3 and i = 2 we would have the matrix 

(J ? !)■ 

Vo i o/ 


(t + l)st 


except the (i + l)st the same as the rows of B. The 
by the ith. In th e above example 


(1 0 0\/l 5 9\—/I 5 9\ 

0 1 0 2 6 8 = 2 6 8 

\0 1 0 / \3 7 4/ \2 6 8 / 

Also D (A) - 0 since it h as o ne w hole column zero . For this w e cohcTude that 
If B has two adjacent rows equal, then D(B) — 0. (H 


D{B T ) satisfies axioms (11.1) and (11.2) 

because replacing B by its transpose, B T , interchanges the role of rows and columns. 
But / T = /, so (11.3) is also satisfied. Thus D(B J ) satisfies axioms (11.1)—(11.3), hence 
by uniqueness must coincide with D(B). In other words 

D(B T ) = D{B). (11-17) 

11.3. The existence of determinants 

We shall now prove the existence of determinants. That is, we shall construct a 
function of n x n matrices that clearly satisfies (11.1), (11.2) and (11.3). 

For n — 1 

define D((a)) - a. 

For n = 2 


define D 


— ad — be. 







It is easy to check directly that (ll,l)-gi.3) are satisfied. We now proceed 
inductively. Suppose that we assume the existence of (w - 1) x (n - 1) determinants. 
Let 

- A = M - 

be an fixn matrix. Consider some definite position, say the position at the ith 
row and fcth column. Let us cancel the ith row and fcth column in A and take the 
determinant of the remaining ( n— l)-rowed matrix. This determinant multiplied 
by ( — l)' +fc will be called the cofactor of a ik and be denoted by A ik . The distribution of 
the sign (—l) i+,£ follows the chessboard pattern, namely 

+ - + - • • • 

- + - + • • • 

+ - + -••• 

- + - + • • • 


Let i be any number from 1 to n. We consider the following function D of the 
matrix {A): 

D = a n A n + a i2 A i2 + —I- a in A in . (11.18) 

It i s the s um of the products of the ith row and their cofactors. 


Consider this D and its dependence on a given column, say A k . For v =£k,A iv 
on A k and a iv does not depend on it; for v = k,A ik does not depend 


o n A k but a ik is one element of this column. Thus (11.1) is satisfied. Assume next tha t 
two adjacent columns A k and A k + l are equal. For v ^ k, k + 1, we have then two 

A ik and A t - fc + ! are the sam e bu t the signs are opposite; he nce A ik = —A ik + 1 whereas 


we have a iv = 0 for v ^ ipvhile a u = 1, A u = 1. Hence D = 1 and this is (11.3). This 
proves t he exist ence of an n-rowed determinant as well as the tru th of formula 


Equation (11.18) may be generalized as follows. In our determinant replace the ith 
row by the j th row and develop according to this new row. For i # j that determinant 
is 0 and for i = j it is D: 

. \ D for j = i, 1 

a n A n + a i2 A, 2 + - + a Jn A ln - | Q for . # . j. 

If we interchange the rows and the columns, we get the following formula: 

f D for h = k,l 

a ihAi k + a 2 hA 2k + ■■■ + a nh A„ h = for 
Equation (11.20) says that, if we form the matrix 


(11.19) 


( 11 . 20 ) 




called th e cofactor matrix of A, then 


B t A = D{A)I. 
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Notice that we, have already proved that if A is singular (so that the columns of a a re 
l inearly dependent), then D(A) = 0 . The preceding equation gives a formula for A - 1 
if D{A) ^ 0 . Thus we have proved 


A matrix A is invertible if and only if D(A) ^ 0. If D(A) ^ 0, then 


A~* 


—r- 

D{A) 



( 11 . 21 ) 


where B is the cofactor matrix of A. 

This formula for A 1 is known as Cramer’s rule. It is not an effective way of 
computing A ~ 1 if n > 2. (For n = 2, it coincides with the prescription in Chapter 1.) 
For n > 2, it is better to use the algorithm described in Chapter 10. However, 
Cramer’s rule does have theoretical importance. For instance, it shows that the 
entries of A -1 a re a ll quotients of a polynomial in the entries of A by the 
determinant. - s -=- ? - 


Summary 

A Determinants 

You should know the axioms for determinants and be able to use them directly for 
evaluation of determinants. 

You should be able to state and apply the rule for evaluating a determinant by use 
of cofactors. 


You should be able to state and apply Cramer’s rule for the inverse of a 



(11.15) or (11.19) becomes quite unpleasant (involving 4! multiplications 
on 4! — 1 additions) while the algorithm described in section 11.1 is quite 
manageable. Here are several against which you can check your 
arithmetic. 


(a) Det 

ll 2 3 4\ 

5 8 11 12 

16 9 13 15 

= -2. | 


\7 in i a J 


Y 7 i i i i 




/° 

2 

2 

2 ' 



' 8 

1 

2 

2 


(b) Det 





= 12. 

3 

3 

4 

2 


l 3 

3 

3 

5 1 



h 

1 

1 

1\ 


(c) Det 

2 

3 

3 

3 


2 

2 

3 

3 

= L 


\2 

2 

2 

3 









where f(x) = (U - x)(r 2 - x)(r 3 - x)(r 4 - x). 

Hint: The determinant of the matrix below is a function F(x). 


1 

X 

a 

— X 

a 

— X 

a 

— X 

b — x 

r 2 

— X 

a 

— X 

a 

— X 

b — x 

b 

— X 


— X 

a 

— X 

b — x 

b 

— X 

b 

— X 

fA 

— X 


But it is a linear function of x since we may subtract the first row from all 
the remaining rows to eliminate the x from all but the first row. Hence 

__ F(x) = A + Bx _ 


for some constants A and B. But F(a) = f{a) and F(b) —f(b}. So solve for A 
and set x = 0. What does the formula become when a = W. 


1 1 7. Tn generalization of e x am pl e (c), sh ow that 



/ \ 



fi i i n 


r\ x 

x a a a 

/ \/r . \ 

Det 

pc _v b 

— (a — x)(b - y)(c - z).— 


n J 

V v* l) 'T n 



\x y 2 c / 



11.3. In generalization of (c), show that 


1 1_l_i_i \ 


1 I I I X 1 

1_ 


X y 7T W ' 

x 2 y 2 z 2 w 2 
lx 3 y 3 z 3 w 3 / 

II 

O' 

1 

x_ 

i 

i 

x_ 

1 

T 

1 

1 

N 


State and prove the corresponding fact with 4 replaced by n. 
11.4. Show that 


\Det(A 1 ,...,A n )\ < Mi || ••• \\A„ ||. 

When does equality hold? 

(Hint: Use the interpretation of j DetJ in terms of volume.) 

11 . 5 . Show that if 0 is an orthogonal matrix (so 00 T = I), then DetO = + 1 . 

11 . 6 . A matrix R is a rotation if RR r = I and Det R — + 1 . Show that a rotation 
in an odd-dimensional space always leaves at least one non-zero factor 
fixed; i.e., R has 1 as an eigenvalue. 


(Hint: Consider Det {R — /).) 








DING 



The short list of books that we give at the end of this section is not meant as 
bibliography. Rather, it consists of books that students of the course have found 
helpful in supplementing and extending the material covered in this volume. The 
book by Loomis and Sternberg can be considered as a companion text. The 
presentation there is more abstract and formal, with more of an emphasis on 
mathematical proof. The actual mathematical prerequisites are the same as for this 
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definitions and argumentation are greater. On one or two occasions in this book and 
in Volume 2 we have referred to Lo om i s and S ternberg for the detailed proo f of 
some key theorems. The Feynman Lecture s form another general reference giving an 
elegant presentation of physics at the - level of This bookr - 
One of our main subjects is lin ear algebra. The t e xt by Halmos is a classic, with a 
tilt towards extension of the finite dimensional theory in the direction of Hilbert 




Our aiscussion in Chapter 1 started with the geometry of lines. I he natural place 
to go from there is to the study of projective geometry, and we gave some indications 
in this direction in the appendix to Chapter 1 and in Exercises 1.16-1.20. The text by 
Hartshorne gives a coherent introduction to the subject. 

At the end of Chapter 2 we make a brief mention of probability theory, and it is 
one of our major gaps that we don’t give a serious discussion of this important topic. 
A good all-round introduction to probability which does not make heavy 
mathematical demands are the three volumes by Hoel, Port, and Stone. Probability 
theory can easily lead into rather imposing mathematical machinery such as 
measure theory and intricate questions in Fourier analysis. These books have the 
advantage of illustrating the important ideas without getting into the subject deeply 
enough to be entangled in heavv mathematics. The book by Moran is harder 








discussion of finite Markov chains and can be read as a continuation of Chapter 2. 
The book bv Doyle and Snell ' 
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absolute double integrals 281 
absolute line integral 265 
addition of velocities, law of 155 
adjoint transformation 374-7 
affine coordinate function 15 
affine geometry 18 
affine plane 1 
affine space, axioms for 9 
affine transformation 17 




annihilator 356 
arc length 265 
—areas and detei 


Caylev-Hamil 


chain rule 184 

change of basis matrix 41, 349 
characteristic equation 59 
characteristic polynomial 59 
closed forms 262 
cofactor matrix 398 
collision 161 
column reduction 367 
composition 15 
conditional probability 67 
conformal linear transformation 57 
conjugate planes 318 
conservation of energy 161 
conservation of energy momentum 165 
conservation of momentum 161 
constant rank theorem 368 
coordinate function 15 


De Moivre’s theorem 58 
determinant 30, 388-99 
determinant, axioms for 389-390 
differentiable 180 

differential form, linear 199, 247-61 
differential of a map 180 
dimension 347 
directional derivatives 205-9 
dual basis 350-2 
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Ehrenfest model 


basis 39, 348 

eikonal 324 

basis, change of 41 

elastic collision 161 

beats 141 

energy momentum vector 165 

big “oh”, little “oh” 178 

Euclidean scalar product 120-4 

bijective 14 
bilinear 123. 272 

Euclidean transformation 16 
exact forms 250 



exterior product 275 

Fermat’s principle 326-8 
Fibonacci sequence 80 
focal length 319 
force field 248 
forced oscillator 108 
formal power series 82 
forward region 151 

fundamental theorem of affine geometry 43 
fundamental theorem of projective 
geometry 54 

Galilean transformation 160 
Gauss decomposition 322 
Gibbs xii 

Gram-Schmidt process 124-131 











Hessian 225 

Poincare' transformation 154 




point characteristic 324 
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implicit function theorem 238 
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probability 66, 67 



inelastic collision 161 

projection 38 



injective 13 

projective plane 53 



inverse function theorem 230-7 

proper Lorentz transformation 152 



inverse of a matrix 32, 33 

pullback 209-13, 289-95 



isomorphism 40 

Kepler motion 195 
kernel 37, 358 

kernel, finding a basis of 367 

Lagrange multipliers 227-9 

Laplace’s equation 227 

Laplacian 227 

law of cosines 124 

light cone 152 

line integrals 250-64 

linear dependence 7, 34 

linear differential form 199, 247-61 

linear independence 7, 34, 53 

linear optics 328-35 

linear transformation 18, 20 

lines, parametrization of 3, 4 

Quadratic form 133 

Quotient space 354 

rank-nullity theorem 358 
regular 20 
resonance 112, 141 
response curve 112 
rest mass 165 

reverse triangle inequality 157 

Riemann xv 

row reduction 360-8 

saddle point 135 

scalar product, axioms for 123, 131 
simultaneity 154 

singular linear transformation 20 

Snell’s law 313 



Lorentz transformation L52 
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i snacetime 149 


map u 

special relativity 148-66 
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star shaped 261-2 
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matrix addition 24 

steady state solution 110 



matrix jnultiplication_22, 25, 26 

stochastic matrix 71 



matrix of a linear transformation 21 

subspace 343-4 



matrix of a rotation 22 

surjective 14 



mean value theorem 219-222 

symmetric matrix 133 



momentum in Newtonian mechanics 163 

symplectic group 167 



momentum in special relativity 165 

svmnlpctic scalar nrndnct 167 



morse maex rneorem sits ^ 



tangent space 208 

nilpntcnt matrix 38 39 

Taylor’s formula 222-5 

non-singular 20 

normal forms for matrices 60, 62, 63 
normal modes 137-48 
normal modes as waves 145 
null cone 152 

one-dimensional vector space 10 
optical length 325 
orientation 31, 285-9 
orthogonal projection 130 
orthonormal basis 127 
oscillator 103-12 
overdamped oscillator 106 

perspective 51 
phase portrait 95-103 

thin lens 318 

thin lens, matrix of 319 

time, Newton’s concept of 11-13 

trace of a matrix 38 

transition probability 68 

translation 18 

transpose 133 

twin paradox 157 

two forms in 1R 2 277 

two forms in IR 3 296 

undamped oscillator 104 
underdamped oscillator 105 

variation of parameters 109 
vector space 7, 341-2 
vector space, axioms for 8 


velocity transformation 160 









This textbook has been developed from a 
course taught at Harvard over the last 
decade The course covers principally the 
theory and physical applications of linear 
algebra, and of the calculus of several 
variables, particularly the exterior 
calculus. 

The authors adopt the spiral method' of 
teaching covering the same topic several 
times at increasing levels of 
sophistication and range of application 
Thus the student develops a deep 
intuitive understanding of the subject as a 
whole and an appreciation of the natural 
progression of ideas 

The first four chapters deal with the 
algebra and analysis of square, in 
particular 2x2. matrices. In these 
chapters such matters as determinants 
and their relation to area and orientation, 
vector spaces, conformal linear geometry 
in the plane, eigenvalues and 
eigenvectors, the power of a matrix, 
Markov chains, homogeneous linear 
differential equations, the exponential of a 
matrix, scalar products, quadratic forms, 
and special relativity are explored. 

The next two chapters cover differential 
calculus beginning with the differential of 
a map between vector spaces, and 
discussing such topics as the chain rule, 
Kepler motion, the Born approximation, 
directional and partial derivatives, and 
linear differential forms In Chapter 6 
topics covered include vector versions of 
the mean-value theorem, Taylor's 
formula, and the inverse function 
theorem, critical point behaviour and 
Lagrange multipliers 

Cover design Ken Vail 


In Chapters 7 and 8 attention moves to 
the integral calculus, the student 
progressing from linear Differential forms 
and their line integrals to exterior 
two-forms and their corresponding 
two-dimensional integrals The exterior 
derivative is introduced and invariance 
under pullback is stressed. Green s 
theorem is proved, and surface integrals 
in three-space are studied 
In Chapter 9 the mathematics of the first 
eight chapters is applied to the theory of 
optics. 

The last two chapters contain 
generalizations and developments of the 
theory of vector spaces and 
determinants. 

this book will serve as a fundamental 
text not only for students in physics, but 
also for students in mathematics 
mterestea in the most evident 
applications of mathematical definitions, 
results and theories.' 

Padiatre and Padologie 

there is to my knowledge no 
comparable book, and it is hard to 
imagine a more inspiring one.' 

Times Higher Education Supplement 

‘Not only is the mathematics clean, 
elegant, and modern, but the presentation 
is humane, especially for a mathematics 
text. Examples are provided before 
generalisation, and motivation and 
applications are kept firmly in view This 

is first rate 1 ' 

American Journal of Physics 



















